Final Report

Group 3

Author

尹子維、李承祐、黃亮臻、張立勳

Published

June 24, 2024

Introduction

本次報告使用的資料來自於 DepMap Portal,其屬於 DepMap Consortium (DMC),此機構致力於加速癌症精準醫療的發展,建構了具系統性的資料集,並提供多種工具可進一步分析與視覺化。

Figure 1. DepMap Portal

本次報告使用的資料可分為基因表達量、代謝體濃度、癌細胞轉移方向,分別有1406、264、264筆 (cell lines) 資料。資料是基於以下實驗流程取得(“Depmap” 2024):

Figure 2. Metastasis Map of Human Cancer Cell Lines

在基因表達量此資料中,包含1406筆cell lines、19221個基因表達量 (經正規化後的RNA-seq資料),資料整理如下:

Figure 3. Overview of Genes Dataset
Welcome to package ztable ver 0.2.3
Table 1. Overview of Genes Dataset
Genes   Cell Lines
  Complete Contains.0 Total   Complete Contains.0 Total
Number 0 1406 1406 7959 11262 19221

由表1可看出此資料有很多0,且所有基因都有些許樣本是測不到的,而這些測不到的基因(表達量為0)的可能原因有很多,包含此基因本身就不容易被測序到,也因此此基因本來就不容易做後續分析,因而可以將其移除;又或者是測序過程導致的,則可針對其原因作數值調整;也或者是在這組樣本上表達量本來就是0,所以無需作任何調整。基於以上可能原因,在無法確認為何種原因的情況下,先將資料保留,待後續Quality Control階段一併處理。

在代謝體濃度此資料中,包含265筆cell lines、225個代謝體濃度 (經log轉換後) ,其中有41筆重複的 cell lines,資料整理如下:

Figure 4. Overview of Metabolites Dataset
Table 2. Overview of Metabolites Dataset
Metabolites   Cell Lines
  Complete Contains.0 Total   Complete Contains.0 Total
Number 224 0 224 225 0 225

由表2可看出此資料所有數值皆不為0。

在癌細胞轉移方向此資料中,包含265筆cell lines、5種器官(癌細胞是否轉移)以及癌細胞所屬主要癌症 ,其中有41筆重複的cell lines,且皆無遺失值。

Figure 5. Overview of Metastasis Dataset

以DepMap ID將三種資料中共有的cell lines進行整合,整合的資料共224筆(cell lines),包含19221個基因(表達量)、225個代謝體(濃度)、5種器官(癌細胞是否轉移)以及癌細胞所屬主要癌症症,且皆無遺失值。

Quality Control

在基因表達量此資料中,移除1526個低表達量的基因(\(>0.5\)的個數至少有2個),接著移除13268個有極值表現量的基因(\(>3\)倍標準差),移除後的資料包含224筆cell lines、4427個基因表達量(Phipson 2024)

以圖6為例,\(TNMD(64102)\)此基因表達量大於\(0.5\)個數僅有一個(其值為\(0.584962501\)),因此移除\(TNMD(64102)\)此基因。

Figure 6. Demonstration of Low Expression Gene

Summary Statistics & Data Visualization

Plots of Statistics

Bar Chart of Metastasis on 5 Organs

Figure 7. Stacked Bar Chart of Metastasis on 5 Organs

由圖7可看出癌細胞是否轉移在Brain上的資料是明顯不平衡的,在其他器官上的資料則相對平衡,這在後續模型切割訓練集與測試集的時候須納入考量,依比例切割。

Bar Chart of Statistics about Gene Expression

Figure 8. Bar Chart of Statistics about Gene Expression

由圖8的基因表達量Mean分布圖可看出基因表達量大多集中在0到8之間,但也有些許基因有較高的表達量(\(>15\)),代表大部分基因之間表達量彼此差異小,小部分基因與其他基因有較大的表達量差異。Variance分布圖則可看出大部分基因表達量Variance都超過1,甚至有小部分基因表達量Variance接近15,代表大部分基因表達量差異大,這也可以從Kurtosis分布圖上佐證,大部分基因表達量Kurtosis小於0,代表大部分基因表達量較分散。Skewness分布圖則看出有小部分基因表達量有些微左偏和右偏的情況。

Bar Chart of Statistics about Metabolites

Figure 9. Bar Chart of Statistics about Metabolites

由圖9的代謝體濃度Mean分布圖可看出基因表達量大多集中在5.8到6之間,但也有些許基因有較低的表達量(\(<5.7\)),但整體而言基因之間表達量彼此差異小。Variance分布圖則可看出大部分基因表達量Variance都小於0.5,只有小部分基因表達量Variance大於1.5,代表大部分基因表達量差異小,這也可以從Kurtosis分布圖上佐證,大部分基因表達量Kurtosis大於0,代表大部分基因表達量較集中。Skewness分布圖則看出有大部分基因表達量有些微左偏和右偏的情況。

機器學習分析

目標

預測五個變數(Bone, Brain, Kidney, Liver, Lung)是否有轉移。

方法

  • 深度學習: Multi-head Neural Network
  • 機器學習: Decision Tree, Random Forest, LightGBM, CatBoost

預處理

劃分資料集

將資料按照9:1切分訓練集與測試集。針對五個目標變數的轉移狀態,此資料共有28種可能的狀態組合。採用分層切分的方式,確保各個組合在訓練集與測試集的比例盡量保持一致。

由上圖發現,Brain存在些許資料不平衡問題(1.89倍),其餘變數則不是很明顯,後續分析將嘗試針對Brain變數做上採樣(SMOTE)。

Multi-head Neural Network

目標

希望做到一個模型同時預測多個目標變數。

模型設計

共享層

包含一個全連接層,配合ReLU激活函數,其主要作用是提取輸入資料中的通用特徵。這層的輸出維度設定為128維。

五個獨立的輸出層

在共享層之後,架構分出五個獨立的輸出層,每個輸出層都由一個全連接層構成。每個輸出層都專門負責預測一個特定的目標變數。這種設計允許模型對每個目標進行專門的學習和預測,同時基於共享層的特徵,加強模型對各目標之間可能存在的隱含關聯的理解。

損失函數

採用二元交叉熵(Binary Cross-Entropy)作為損失函數,對每個獨立輸出層的預測結果計算損失。由於這是一個多目標的預測任務,模型會計算所有輸出層的損失總和,以此來進行梯度下降並更新網路的權重。

優化器

Adam

學習率

0.0005

Multi-head Neural Network
共有 \((4659 + 1) \times 128 + (128 + 1) \times 5 = 597125\) 個參數待估計。

模型表現

<torch._C.Generator object at 0x36b45ca30>

深度學習模型對隨機初始值特別敏感,尤其當樣本數較少時,這一點更為明顯。在此次實驗的五個目標變數的預測中,AUC值的波動範圍從0.4到0.7不等。但若是初始值設定不佳,模型有時會在初期傾向將所有樣本預測為全正或全負,導致預測結果極不穩定。因此,在樣本量有限的情況下,依賴深度學習可能不是理想的選擇。

Machine Learning Method

目標

為目標變數: Bone, Brain, Kidney, Liver, Lung 分別建立五個模型,並觀察對這五個目標最有影響的變數。

方法

由於希望考慮交互作用項,以及後續方便解釋特徵重要性,這裡皆使用Tree-based 模型。

  • Decision Tree
  • Random Forest
  • LightGBM
  • CatBoost

Decision Tree

        Accuracy  Precision  Recall      F1     AUC
Bone        0.48     0.3750  0.2727  0.3158  0.4578
Brain       0.52     0.2222  0.2857  0.2500  0.4484
Kidney      0.32     0.3750  0.4615  0.4138  0.3141
Liver       0.60     0.6154  0.6154  0.6154  0.5994
Lung        0.52     0.5385  0.5385  0.5385  0.5192

Random Forest

        Accuracy  Precision  Recall      F1     AUC
Bone        0.64     0.6250  0.4545  0.5263  0.6883
Brain       0.68     0.3333  0.1429  0.2000  0.4206
Kidney      0.64     0.6111  0.8462  0.7097  0.7115
Liver       0.52     0.5333  0.6154  0.5714  0.6538
Lung        0.40     0.4444  0.6154  0.5161  0.3622

LightGBM

        Accuracy  Precision  Recall      F1     AUC
Bone        0.56     0.5000  0.6364  0.5600  0.5455
Brain       0.48     0.1250  0.1429  0.1333  0.4206
Kidney      0.52     0.5238  0.8462  0.6471  0.5577
Liver       0.60     0.6154  0.6154  0.6154  0.5833
Lung        0.48     0.5000  0.7692  0.6061  0.5577

CatBoost

        Accuracy  Precision  Recall      F1     AUC
Bone        0.56     0.5000  0.5455  0.5217  0.6299
Brain       0.64     0.3333  0.2857  0.3077  0.4921
Kidney      0.48     0.5000  0.8462  0.6286  0.5897
Liver       0.52     0.5333  0.6154  0.5714  0.5897
Lung        0.48     0.5000  0.7692  0.6061  0.5641

SMOTE

僅針對Brain變數做上採樣,將少類的樣本補到多數類的80%。

Brain
原資料: 轉移 130, 沒轉移 69 
SMOTE後: 轉移 130, 沒轉移 104 
              Accuracy  Precision  Recall      F1     AUC
Model                                                    
DecisionTree      0.48     0.3750  0.2727  0.3158  0.4578
RandomForest      0.72     0.8333  0.4545  0.5882  0.7695
LightGBM          0.60     0.5714  0.3636  0.4444  0.6558
CatBoost          0.68     0.7143  0.4545  0.5556  0.6234

將做過SMOTE上採樣的資料進行訓練,所有模型的表現皆提高了。除了DecisionTree只有微幅上升以外,其餘模型RandomForest, LightGBM, CatBoost皆提高了0.2以上。

各個目標變數對應AUC最高之模型

                   Bone               Brain  ...         Liver      Lung
Accuracy           0.64                0.72  ...          0.52      0.48
Precision         0.625              0.8333  ...        0.5333       0.5
Recall           0.4545              0.4545  ...        0.6154    0.7692
F1               0.5263              0.5882  ...        0.5714    0.6061
AUC              0.6883              0.7695  ...        0.6538    0.5641
Source     RandomForest  RandomForest_Smote  ...  RandomForest  CatBoost

[6 rows x 5 columns]

以下為針對不同目標變數預測轉移的最佳模型選擇:

  • 對於是否轉移到「骨頭」,隨機森林模型表現最佳,AUC可達66.56%。
  • 對於是否轉移到「大腦」,結合SMOTE隨機森林模型能夠達到最佳效果,AUC可達76.95%。
  • 對於是否轉移到「腎臟」,隨機森林模型表現最佳,AUC可達71.15%。
  • 對於是否轉移到「肝臟」,「隨機森林模型表現最佳,AUC可達65.38%。
  • 對於是否轉移到「肺臟」,CatBoost模型表現最佳,AUC為56.41%。

對是否轉移影響最大的前20個變數

  • 預測是否轉移至「骨頭」,重要的代謝體為: C34.4.PC
  • 預測是否轉移至「大腦」,重要的代謝體為: C54.6.TAG, C58.8.TAG, asparagine, C56.6.TAG, C54.4.TAG
  • 預測是否轉移至「腎臟」,重要的代謝體為: homocysteine, cytidine
  • 預測是否轉移至「肝臟」,重要的代謝體為: GABA, putrescine, cytidine
  • 預測是否轉移至「肺臟」,重要的代謝體為: taurodeoxycholate.taurochenodeoxycholate, C36.1.PC, C54.6.TAG, succinate.methylmalonate

統計方法分析

除了上述的機器學習的分析方法,我們也嘗試了使用單純的統計方法來分析這個資料,以下是我們的分析流程。

分析流程

  1. 篩選變數

flowchart TD
  A[r個基因] --> C[r+t個基因和代謝體]
  B[t個代謝體] --> C
  C --> D["(r+t取2)個交互作用"]
  D --> E["(r+t+(r+t取2))選k個變數"]

  1. 建立模型

flowchart TD
  A[是否出現轉移] --> C[(Hurdle Model)]
  B[轉移到幾個部位] --> C
  C --> Criterion1[Evaluate by MSE]
  D[各個部位是否出現轉移] --> E[(5個logistic regression)]
  E --> Criterion2[Evaluate by AUC and Accuracy]

  1. 關於篩選變數:

    在這份資料中,由於p>n的問題,篩選變數是一個需要我們仔細思考的問題,另外,我們也從生物資訊相關文獻裡面知道了,除了個別基因和代謝體,基因和基因之間的交互作用,代謝體和代謝體之間的交互作用,基因和代謝體之間的交互作用都會影響反應變數,因此,如何有效的將上述三種交互作用放入模型中是我們考量的方向。 當然,我們可以將上述的三種交互作用一起放入模型中做篩選,但我們的變數個數本就很大了,再放進交互作用項顯然是一種不效率的做法。為了解決這個問題,我們發想了以下流程:

    1. 先從所有基因中篩選出r個最重要的基因
    2. 再從所有代謝體中篩選出t個最重要代謝體
    3. 接下來,我們只用(r+t)個基因和代謝體來看交互作用,共有\(r+t \choose 2\) 個交互作用。
    4. 最後,從(r+t+\(r+t \choose 2\))個變數中篩選出最重要的k個變數
  2. 建模

    由這分資料的response,我們可以定義出下列三種反應變數:

    1. 反應變數為一個細胞是否轉移

    2. 反應變數為一個細胞轉移到幾個部位(Count Response)

    3. 反應變數為5個部位分別是否有轉移

    由於我們主要關心是第三種反應變數,前兩者我們會使用Hurdle model來配適模型,可以同時看出一些前兩者反應變數的資訊,這邊會使用。第三種反應變數,我們會使用logistic regression來配適模型,並且透過切割train test和AUC來驗證模型是否有效。

篩選變數

Differential Analysis

以Primary Disease的Lung Cancer作為切割點,將資料區分成Lung Cancer與Non-Lung Cancer,再比對基因表達量和代謝體濃度在各器官癌細胞是否轉移上是否有顯著不同,而以Lung Cancer做為切割點的原因是有二個,首先已知Lung為預測各器官癌細胞是否轉移的其中之一的器官,直觀上認為考慮到Lung Cancer 對Lung癌細胞是否轉移後續分析上會有較好的表現,其次是以Lung Cancer作為切割點對於後續統計檢定所需的樣本數是足夠的(\(>30\))。以圖1為例,此圖為基因表達量在Bone上癌細胞是否轉移差異分析流程圖,將基因表達量區分成Lung Cancer與Non-Lung Cancer的資料,再分別對Lung Cancer與Non-Lung Cancer的資料計算在Bone上癌細胞是否轉移的基因表達量Mean相除,取對數後便可得一t檢定量。以下數學式為例,此式為基因Gene1的Lung Cancer的表達量資料t檢定計算式, \[T=log(\frac{Mean(Metastasis|Lung,Gene1)}{Mean(NonMetastasis|Lung,Gene1)})\]。以t檢定統計量計算出各基因和各代謝體的\(p-value\)後,再輔以\(fold change\)作為篩選變數標準,採滾動式調整,最終挑選出50個基因與代謝體作為後續交互作用項產生所使用的變數,並保留前20個顯著的基因與代謝體(以\(p-value\)作排序)在最終挑選的變數。由此50個基因與代謝體所產生的交互作用項變數透過前述流程可得出約30個顯著的交互作用項變數,再加上先前所保留20個基因與代謝體變數,最終可得出約50個變數供後續模型使用。

若針對預測至少1種器官發生癌細胞轉移,則將5種器官癌細胞是否轉移整合成至少1種器官癌細胞是否轉移,並依上述流程可得出約50個變數供後續模型使用。

flowchart TD
  A[Gene Dataset] --> B[Lung Cancer]
  A --> C[Non-Lung Cancer]
  B --> D[Bone Metastasis]
  B --> E[Bone Non-Metastasis]
  C --> F[Bone Metastasis]
  C --> G[Bone Non-Metastasis]

Figure 1. Flow Chart of Gene Differential Analysis on Bone Metastasis given Lung Cancer

Volcano Plot (Bone)

Figure 2. Volcano Plot of Bone Metastasis on Gene Expression under Lung Cancer

Figure 3. Volcano Plot of Bone Metastasis on Gene Expression under Non-Lung Cancer

Figure 4. Volcano Plot of Bone Metastasis on Metabolites under Lung Cancer

Figure 5. Volcano Plot of Bone Metastasis on Metabolites under Non-Lung Cancer

Figure 6. Volcano Plot of Bone Metastasis on Interaction Effect under Lung Cancer

Figure 7. Volcano Plot of Bone Metastasis on Interaction Effect under Non-Lung Cancer
Table 1. Criteria of Differential Analysis on Bone Metastasis without Interaction Effect
Bone
Lung Cancer   Non-Lung Cancer Total
Genes   Metabolites   Genes   Metabolites  
  P.value Fold.Change   P.value Fold.Change   P.value Fold.Change   P.value Fold.Change   Number
Criteria 0.015 0.500 0.025 0.100 0.015 0.500 0.025 0.100 48.000
Table 2. Criteria of Differential Analysis on Bone Metastasis with Interaction Effect
Bone
Lung Cancer   Non-Lung Cancer   Total
  P.value Fold.Change   P.value Fold.Change   Number
Criteria 0.0007 3.0000 0.0007 3.0000 47.0000

Volcano Plot (Brain)

Figure 8. Volcano Plot of Brain Metastasis on Gene Expression under Lung Cancer

Figure 9. Volcano Plot of Brain Metastasis on Gene Expression under Non-Lung Cancer

Figure 10. Volcano Plot of Brain Metastasis on Metabolites under Lung Cancer

Figure 11. Volcano Plot of Brain Metastasis on Metabolites under Non-Lung Cancer

Figure 12. Volcano Plot of Brain Metastasis on Interaction Effect under Lung Cancer

Figure 13. Volcano Plot of Brain Metastasis on Interaction Effect under Non-Lung Cancer
Table 3. Criteria of Differential Analysis on Brain Metastasis without Interaction Effect
Brain
Lung Cancer   Non-Lung Cancer Total
Genes   Metabolites   Genes   Metabolites  
  P.value Fold.Change   P.value Fold.Change   P.value Fold.Change   P.value Fold.Change   Number
Criteria 0.005 0.250 0.250 0.150 0.005 0.250 0.250 0.150 53.000
Table 4. Criteria of Differential Analysis on Brain Metastasis with Interaction Effect
Brain
Lung Cancer   Non-Lung Cancer   Total
  P.value Fold.Change   P.value Fold.Change   Number
Criteria 0.0005 3.2500 0.0005 3.2500 53.0000

Volcano Plot (Kidney)

Figure 14. Volcano Plot of Kidney Metastasis on Gene Expression under Lung Cancer

Figure 15. Volcano Plot of Kidney Metastasis on Gene Expression under Non-Lung Cancer

Figure 16. Volcano Plot of Kidney Metastasis on Metabolites under Lung Cancer

Figure 17. Volcano Plot of Kidney Metastasis on Metabolites under Non-Lung Cancer

Figure 18. Volcano Plot of Kidney Metastasis on Interaction Effect under Lung Cancer

Figure 19. Volcano Plot of Kidney Metastasis on Interaction Effect under Non-Lung Cancer
Table 5. Criteria of Differential Analysis on Kidney Metastasis without Interaction Effect
Kidney
Lung Cancer   Non-Lung Cancer Total
Genes   Metabolites   Genes   Metabolites  
  P.value Fold.Change   P.value Fold.Change   P.value Fold.Change   P.value Fold.Change   Number
Criteria 0.005 0.500 0.005 0.150 0.005 0.500 0.005 0.150 48.000
Table 6. Criteria of Differential Analysis on Kidney Metastasis with Interaction Effect
Kidney
Lung Cancer   Non-Lung Cancer   Total
  P.value Fold.Change   P.value Fold.Change   Number
Criteria 0.0002 6.0000 0.0002 6.0000 48.0000

Volcano Plot (Liver)

Figure 20. Volcano Plot of Liver Metastasis on Gene Expression under Lung Cancer

Figure 21. Volcano Plot of Liver Metastasis on Gene Expression under Non-Lung Cancer

Figure 22. Volcano Plot of Liver Metastasis on Metabolites under Lung Cancer

Figure 23. Volcano Plot of Liver Metastasis on Metabolites under Non-Lung Cancer

Figure 24. Volcano Plot of Liver Metastasis on Interaction Effect under Lung Cancer

Figure 25. Volcano Plot of Liver Metastasis on Interaction Effect under Non-Lung Cancer
Table 7. Criteria of Differential Analysis on Liver Metastasis without Interaction Effect
Liver
Lung Cancer   Non-Lung Cancer Total
Genes   Metabolites   Genes   Metabolites  
  P.value Fold.Change   P.value Fold.Change   P.value Fold.Change   P.value Fold.Change   Number
Criteria 0.015 0.600 0.075 0.150 0.015 0.600 0.075 0.150 53.000
Table 8. Criteria of Differential Analysis on Liver Metastasis with Interaction Effect
Liver
Lung Cancer   Non-Lung Cancer   Total
  P.value Fold.Change   P.value Fold.Change   Number
Criteria 0.0002 6.0000 0.0002 6.0000 49.0000

Volcano Plot (Lung)

Figure 26. Volcano Plot of Lung Metastasis on Gene Expression under Lung Cancer

Figure 27. Volcano Plot of Lung Metastasis on Gene Expression under Non-Lung Cancer

Figure 28. Volcano Plot of Lung Metastasis on Metabolites under Lung Cancer

Figure 29. Volcano Plot of Lung Metastasis on Metabolites under Non-Lung Cancer

Figure 30. Volcano Plot of Lung Metastasis on Interaction Effect under Lung Cancer

Figure 31. Volcano Plot of Lung Metastasis on Interaction Effect under Non-Lung Cancer
Table 9. Criteria of Differential Analysis on Lung Metastasis without Interaction Effect
Lung
Lung Cancer   Non-Lung Cancer Total
Genes   Metabolites   Genes   Metabolites  
  P.value Fold.Change   P.value Fold.Change   P.value Fold.Change   P.value Fold.Change   Number
Criteria 0.015 0.500 0.075 0.150 0.015 0.500 0.075 0.150 51.000
Table 10. Criteria of Differential Analysis on Lung Metastasis with Interaction Effect
Lung
Lung Cancer   Non-Lung Cancer   Total
  P.value Fold.Change   P.value Fold.Change   Number
Criteria 0.0005 5.0000 0.0005 5.0000 52.0000

Volcano Plot (Organ)

Figure 32. Volcano Plot of Organ Metastasis on Gene Expression under Lung Cancer

Figure 33. Volcano Plot of Organ Metastasis on Gene Expression under Non-Lung Cancer

Figure 34. Volcano Plot of Organ Metastasis on Metabolites under Lung Cancer

Figure 35. Volcano Plot of Organ Metastasis on Metabolites under Non-Lung Cancer

Figure 36. Volcano Plot of Organ Metastasis on Interaction Effect under Lung Cancer

Figure 37. Volcano Plot of Organ Metastasis on Interaction Effect under Non-Lung Cancer
Table 11. Criteria of Differential Analysis on Organ Metastasis without Interaction Effect
Organ
Lung Cancer   Non-Lung Cancer Total
Genes   Metabolites   Genes   Metabolites  
  P.value Fold.Change   P.value Fold.Change   P.value Fold.Change   P.value Fold.Change   Number
Criteria 0.004 0.500 0.050 0.100 0.004 0.500 0.050 0.100 52.000
Table 12. Criteria of Differential Analysis on Organ Metastasis with Interaction Effect
Organ
Lung Cancer   Non-Lung Cancer   Total
  P.value Fold.Change   P.value Fold.Change   Number
Criteria 0.0001 4.0000 0.0001 4.0000 50.0000

最終挑選出的變數整理如下:

variables_bone variables_brain variables_kidney variables_liver variables_lung variables_binary
MAGEE1..57692. CDKL3..51265. SMAP1..60682. PNMA8A..55228. BBS5..129880. AMIGO1..57463.
ANKRD18B..441459. ISL2..64843. NLRP1..22861. H2BC5..3017. DBNDD2..55861. NECAB3..63941.
PNPLA4..8228. DZIP1..22873. CYBA..1535. H2BC4..8347. CTSF..8722. SMAP1..60682.
GAMT..2593. MAP10..54627. BEX3..27018. H2AC6..8334. DOCK4..9732. SEPTIN5..5413.
CTSF..8722. FMNL1..752. LRP3..4037. CCDC106..29903. AUTS2..26053. MAPK8IP1..9479.
VAT1..10493. CEBPZ..10153. ERMP1..79956. GATA2..2624. ARL4D..379. ADCY9..115.
IRF5..3663. NLK..51701. SERPINB8..5271. SERPINB8..5271. SEMA3B..7869. CBX2..84733.
MYH10..4628. AREG..374. SLC29A3..55315. VAV1..7409. ERMP1..79956. SIK1B..102724428.
FECH..2235. MAN1A2..10905. AREG..374. TNFRSF14..8764. SPACA6..147650. GAS6..2621.
AREG..374. ZNF607..84775. CXCL8..3576. thymine KCNMA1..3778. homocysteine
acetylcarnitine C46.0.TAG C40.6.PC butyrylcarnitine.isobutyrylcarnitine trimethylamine.N.oxide cytidine
aconitate C58.7.TAG homocysteine cytidine homocysteine C32.2.PC
isocitrate butyrobetaine C18.2.LPC acetylcholine C46.0.TAG C32.0.PC
C18.2.LPC cytidine NADP asparagine cytidine guanosine
C18.0.CE beta.alanine C34.4.PC X2.aminoadipate butyrylcarnitine.isobutyrylcarnitine C36.1.PC
C36.1.DAG C52.5.TAG uridine C18.0.LPE phosphocreatine phosphocreatine
cytidine C56.6.TAG thiamine C48.3.TAG thiamine C18.0.LPE
pyroglutamic.acid C58.8.TAG C18.0.LPE C58.7.TAG niacinamide C36.1.DAG
inositol C54.4.TAG cytidine H2BC5..3017. _ PLEK2..26499. lysine SMAP1..60682. _ PHYH..5264.
C58.8.TAG DZIP1..22873. _ RAB38..23682. CYBA..1535. _ DEF6..50619. H2BC5..3017. _ OAS3..4940. lactose MAPK8IP1..9479. _ SMAP1..60682.
MYH10..4628. _ PNPLA4..8228. MRPS6..64968. _ DZIP1..22873. GPC1..2817. _ ATP9A..10079. H2BC5..3017. _ EFNB2..1948. LRP3..4037. _ GPC1..2817. EFNB2..1948. _ CYBA..1535.
BEX3..27018. _ PNPLA4..8228. MRPS6..64968. _ ISL2..64843. NLRP1..22861. _ CYBA..1535. CDCP1..64866. _ H2BC5..3017. GSTM3..2947. _ IL1R1..3554. NECAB3..63941. _ SMAP1..60682.
PKIG..11142. _ PNPLA4..8228. MID1..4281. _ RC3H2..54542. CTNNAL1..8727. _ CYBA..1535. CCDC106..29903. _ BEX3..27018. GSTM3..2947. _ LRP3..4037. SLC16A3..9123. _ CYBA..1535.
CTSF..8722. _ PNPLA4..8228. MID1..4281. _ NLK..51701. MT2A..4502. _ CYBA..1535. PHYHD1..254295. _ UCHL1..7345. GSN..2934. _ IL1R1..3554. SLC16A3..9123. _ EFNB2..1948.
AS3MT..57412. _ PNPLA4..8228. AREG..374. _ TFPI..7035. MT2A..4502. _ CTNNAL1..8727. EPS8L2..64787. _ H2BC5..3017. AUTS2..26053. _ GPC1..2817. GSN..2934. _ PHYH..5264.
AS3MT..57412. _ CTSF..8722. CEBPZ..10153. _ FAM136A..84908. EFNB2..1948. _ MT2A..4502. H2AC6..8334. _ PLEK2..26499. AUTS2..26053. _ GSTM3..2947. GSN..2934. _ SMAP1..60682.
ANKRD18B..441459. _ PNPLA4..8228. PAIP2..51247. _ FAM136A..84908. RAC2..5880. _ CYBA..1535. H2AC6..8334. _ OAS3..4940. AUTS2..26053. _ GSN..2934. TMEM171..134285. _ CYBA..1535.
ANKRD18B..441459. _ GAMT..2593. PAIP2..51247. _ CEBPZ..10153. LRP3..4037. _ GPC1..2817. H2AC6..8334. _ H2BC5..3017. BBS5..129880. _ GSTM3..2947. PKIG..11142. _ NECAB3..63941.
ANKRD18B..441459. _ MYH10..4628. STIL..6491. _ MID1..4281. DIPK1B..138311. _ GPC1..2817. H2AC6..8334. _ EPS8L2..64787. CTSF..8722. _ GPC1..2817. PKIG..11142. _ GSN..2934.
IRF5..3663. _ GAMT..2593. CEPT1..10390. _ MID1..4281. BEX3..27018. _ ATP9A..10079. H2AC6..8334. _ IRX3..79191. CTSF..8722. _ PCSK1N..27344. MT1E..4493. _ CYBA..1535.
IRF5..3663. _ MYH10..4628. EPC2..26122. _ MID1..4281. BEX3..27018. _ GPC1..2817. H2BC4..8347. _ OAS3..4940. CTSF..8722. _ LRP3..4037. MT1E..4493. _ EFNB2..1948.
RAC3..5881. _ MYH10..4628. CCDC138..165055. _ FAM136A..84908. BEX3..27018. _ SMAP1..60682. H2BC4..8347. _ H2BC5..3017. CTSF..8722. _ GSN..2934. AMIGO1..57463. _ GAMT..2593.
RAC3..5881. _ PKIG..11142. CCDC138..165055. _ MID1..4281. BEX3..27018. _ LRP3..4037. H2BC4..8347. _ EPS8L2..64787. CTSF..8722. _ AUTS2..26053. AMIGO1..57463. _ GSN..2934.
RAC3..5881. _ MAGEE1..57692. CCDC138..165055. _ PAIP2..51247. BEX3..27018. _ FAXC..84553. H2BC4..8347. _ H2AC6..8334. EVL..51466. _ GPC1..2817. B4GALNT4..338707. _ AMIGO1..57463.
cytidine _ MAGEE1..57692. MFN1..55669. _ FAM136A..84908. SMTN..6525. _ CYBA..1535. asparagine _ PNMA8A..55228. SHISA4..149345. _ AUTS2..26053. SEPTIN5..5413. _ SMAP1..60682.
carnosine _ MAGEE1..57692. KAZN..23254. _ NLK..51701. SMTN..6525. _ EFNB2..1948. cis.trans.hydroxyproline _ PNMA8A..55228. PCDHGC3..5098. _ GPC1..2817. FITM2..128486. _ GAMT..2593.
pyroglutamic.acid _ MAGEE1..57692. KAZN..23254. _ MID1..4281. RASA3..22821. _ CYBA..1535. anthranilic.acid _ PNMA8A..55228. PCDHGC3..5098. _ GSN..2934. FITM2..128486. _ GSN..2934.
VAT1..10493. _ FECH..2235. KAZN..23254. _ CEBPZ..10153. RASA3..22821. _ MT2A..4502. acetylcholine _ PNMA8A..55228. PCDHGC3..5098. _ AUTS2..26053. FITM2..128486. _ SEPTIN5..5413.
AREG..374. _ TNFRSF1B..7133. KAZN..23254. _ STIL..6491. RASA3..22821. _ RAC2..5880. butyrylcarnitine.isobutyrylcarnitine _ PNMA8A..55228. PCDHGC3..5098. _ CTSF..8722. CBX2..84733. _ SMAP1..60682.
TNFAIP3..7128. _ TNFRSF1B..7133. KAZN..23254. _ EPC2..26122. EVL..51466. _ GPC1..2817. X2.aminoadipate _ PNMA8A..55228. LTBP3..4054. _ AUTS2..26053. CBX2..84733. _ PKIG..11142.
IRF5..3663. _ FECH..2235. MAN1A2..10905. _ FAM136A..84908. SRC..6714. _ GPC1..2817. hypoxanthine _ H2BC5..3017. SEMA3B..7869. _ EVL..51466. CBX2..84733. _ SHISA4..149345.
IRF5..3663. _ VAT1..10493. MAN1A2..10905. _ MID1..4281. SRC..6714. _ MAP2..4133. taurodeoxycholate.taurochenodeoxycholate _ CCDC106..29903. SMARCD3..6604. _ SEMA3B..7869. guanosine _ AMIGO1..57463.
RAC3..5881. _ VAT1..10493. ZNF607..84775. _ FAM136A..84908. SRC..6714. _ LRP3..4037. taurodeoxycholate.taurochenodeoxycholate _ PNMA8A..55228. DENND3..22898. _ SEMA3B..7869. acetylcholine _ AMIGO1..57463.
RAC3..5881. _ IRF5..3663. ZNF607..84775. _ CEBPZ..10153. SRC..6714. _ TMPRSS13..84000. ornithine _ PNMA8A..55228. NCOA7..135112. _ SEMA3B..7869. C40.6.PC _ AMIGO1..57463.
ARL14..80117. _ TNFRSF1B..7133. ZNF525..170958. _ KAZN..23254. SRC..6714. _ BEX3..27018. lauroylcarnitine _ PNMA8A..55228. LTBP3..4054. _ ERMP1..79956. phosphocreatine _ AMIGO1..57463.
ARL14..80117. _ TNFAIP3..7128. LSM14A..26065. _ FAM136A..84908. SRC..6714. _ EVL..51466. oleylcarnitine _ PNMA8A..55228. VASN..114990. _ ERMP1..79956. C24.0.SM _ AMIGO1..57463.
citrate _ VAT1..10493. LSM14A..26065. _ MID1..4281. SERPINB8..5271. _ AREG..374. C46.2.TAG _ PNMA8A..55228. ARL4D..379. _ SEMA3B..7869. C50.0.TAG _ NECAB3..63941.
NA LSM14A..26065. _ CEBPZ..10153. SERPINB8..5271. _ LIF..3976. C48.3.TAG _ PNMA8A..55228. ARL4D..379. _ ERMP1..79956. C50.0.TAG _ AMIGO1..57463.
NA LSM14A..26065. _ PAIP2..51247. NA VAV1..7409. _ BEX3..27018. ARL4D..379. _ NCOA7..135112. SIK1B..102724428. _ ADCY9..115.
NA LSM14A..26065. _ STIL..6491. NA NA ARL4D..379. _ LTBP3..4054. SIK1B..102724428. _ CBX2..84733.
NA LSM14A..26065. _ CCDC138..165055. NA NA trimethylamine.N.oxide _ ARL4D..379. NA
NA LSM14A..26065. _ MFN1..55669. NA NA asparagine _ ARL4D..379. NA
NA LSM14A..26065. _ MAN1A2..10905. NA NA NA NA

Lasso

由於變數過多所以我們對基因資料和代謝體資料做降維,而我們第一個嘗試的降維方法為PCA,我們分別對兩筆資料做PCA,但因為兩筆資料的樣本數都遠少於變數個數,前幾個主成分也不能解釋大部分原始資料的變異,從特徵值來看也沒有一個有大於1。所以我們認為做PCA的效益不大,後續再使用別種方法進行降維。

這些是由基因和代謝體各用lasso挑出25個變數,再由50個變數加上所有二階交互作用中用lasso挑出其中50個變數。其中bone/brain/kidney/liver/lung是分別對五種疾病各建一個羅吉斯迴歸模型。binary是只要其中一種疾病有轉移,就coding 成1的羅吉斯迴歸模型。而count為轉移到幾個地方的Poisson 迴歸模型。可以注意到的是不管是哪個模型,我們用lasso挑出來的變數都是二階交互作用項,且每個模型中都有基因對基因和基因對代謝體還有代謝體對代謝體的交互作用。

variables_bone variables_brain variables_kidney variables_liver variables_lung variables_binary variables_count
(Intercept) (Intercept) (Intercept) (Intercept) (Intercept) (Intercept) (Intercept)
EHHADH..1962. _ LAMA4..3910. TAPBPL..55080. _ ZCWPW1..55063. CD44..960. _ ZFX..7543. ELP4..26610. _ P2RX5..5026. SLC1A5..6510. _ ITGAE..3682. ST6GAL1..6480. _ PTPRU..10076. ABCA7..10347. _ CD44..960.
IGFLR1..79713. _ LMCD1..29995. PSTPIP2..9050. _ TNS4..84951. LIPE..3991. _ CD44..960. EXOC2..55770. _ TYRO3..7301. COL12A1..1303. _ WNT5B..81029. TFPT..29844. _ SYNE2..23224. IFT74..80173. _ P2RX5..5026.
IGFLR1..79713. _ PIGT..51604. PLCL2..23228. _ TNS4..84951. P2RX5..5026. _ LIPE..3991. RTN4IP1..84816. _ TYRO3..7301. SERP1..27230. _ CDIPT..10423. ELK3..2004. _ PTPRU..10076. TFPT..29844. _ IFRD1..3475.
RAB15..376267. _ PPDPF..79144. PLCL2..23228. _ GCSH..2653. ELP4..26610. _ CD44..960. TEX2..55852. _ EHHADH..1962. ECHDC2..55268. _ FXYD5..53827. TBP..6908. _ PDE8A..5151. ACCS..84680. _ ITGA6..3655.
RAB15..376267. _ DTNA..1837. RFT1..91869. _ RPL18..6141. SLCO1B3..28234. _ P2RX5..5026. FEM1A..55527. _ SPRY2..10253. EXOSC9..5393. _ SLC1A5..6510. NAAA..27163. _ TRIM2..23321. LYST..1130. _ IFRD1..3475.
LYST..1130. _ EHHADH..1962. NNMT..4837. _ CAPN5..726. TENT5A..55603. _ LIPE..3991. UCHL1..7345. _ TEX2..55852. RHOF..54509. _ DHODH..1723. NAAA..27163. _ LAMA4..3910. NOTCH1..4851. _ P2RX5..5026.
CIART..148523. _ LAMA4..3910. NNMT..4837. _ MAGED4..728239. ZNF513..130557. _ SLC38A5..92745. NOS3..4846. _ P2RX5..5026. RHOF..54509. _ EXOSC9..5393. PSTPIP2..9050. _ ST6GAL1..6480. NOTCH1..4851. _ APOC1..341.
CIART..148523. _ RAB15..376267. NNMT..4837. _ THAP8..199745. SCARA3..51435. _ LIPE..3991. HDX..139324. _ PIK3IP1..113791. GCSH..2653. _ SLC1A5..6510. UCHL1..7345. _ LY75..4065. CAPN5..726. _ TFAP2A..7020.
THAP8..199745. _ SYNE2..23224. SAAL1..113174. _ EML2..24139. LRRC8E..80131. _ ZFX..7543. TRAPPC1..58485. _ ELP4..26610. GCSH..2653. _ ARHGEF2..9181. GPSM1..26086. _ NLRP1..22861. PSTPIP2..9050. _ P2RX5..5026.
THAP8..199745. _ CIART..148523. ZNF608..57507. _ CAPN5..726. FUT10..84750. _ SCARA3..51435. TNFRSF10D..8793. _ FEM1A..55527. LYST..1130. _ FBXL16..146330. GPX8..493869. _ HFE..3077. UCHL1..7345. _ LY75..4065.
SMIM14..201895. _ DTNA..1837. IRS1..3667. _ ZCWPW1..55063. ZNF548..147694. _ ZNF513..130557. PHYHD1..254295. _ FEM1A..55527. UCHL1..7345. _ TNFRSF1B..7133. GPX8..493869. _ PTPRU..10076. UCHL1..7345. _ LAMA4..3910.
SAAL1..113174. _ P2RX5..5026. IRS1..3667. _ TAPBPL..55080. ZNF548..147694. _ SCARA3..51435. TMEM107..84314. _ EXOC2..55770. UCHL1..7345. _ LYST..1130. GPX8..493869. _ PHKA1..5255. UCHL1..7345. _ TFAP2A..7020.
CCDC106..29903. _ LMCD1..29995. IRS1..3667. _ ZNF608..57507. ZNF471..57573. _ TENT5A..55603. ARPIN..348110. _ FEM1A..55527. ZSCAN12..9753. _ TNFRSF1B..7133. GPX8..493869. _ MDFI..4188. ZSCAN12..9753. _ UCHL1..7345.
SESTD1..91404. _ IGFLR1..79713. MYEOV..26579. _ SAAL1..113174. ZNF471..57573. _ SERPINB9..5272. ARPIN..348110. _ PHYHD1..254295. ZSCAN12..9753. _ PLCB4..5332. LRRC8E..80131. _ PDE8A..5151. RFT1..91869. _ ITGAE..3682.
SESTD1..91404. _ CCDC106..29903. C3orf62..375341. _ EML2..24139. ZNF471..57573. _ FUT10..84750. X4.pyridoxate _ alpha.glycerophosphate ZSCAN12..9753. _ UCHL1..7345. KLF11..8462. _ TFPT..29844. GPX8..493869. _ ABCA7..10347.
ZNF286A..57335. _ OAZ1..4946. C3orf62..375341. _ PSTPIP2..9050. TOR4A..54863. _ APOC1..341. cytidine _ TEX2..55852. CBS..875. _ FXYD5..53827. MSRA..4482. _ PHKA1..5255. GPX8..493869. _ LIPE..3991.
ZNF418..147686. _ CCDC106..29903. TCEAL3..85012. _ ZCWPW1..55063. TOR4A..54863. _ SERPINB9..5272. cytidine _ X4.pyridoxate CBS..875. _ SLCO1B3..28234. MSRA..4482. _ NLRP1..22861. GPX8..493869. _ NOTCH1..4851.
TVP23C.CDRT4..100533496. _ SAAL1..113174. TCEAL3..85012. _ CAPN5..726. TOR4A..54863. _ LRRC8E..80131. sorbitol _ cytidine THAP8..199745. _ ZSCAN12..9753. TMEM107..84314. _ GPX8..493869. SAAL1..113174. _ GLTP..51228.
TVP23C.CDRT4..100533496. _ CDC26..246184. TCEAL3..85012. _ NNMT..4837. IRF9..10379. _ ZNF548..147694. thymine _ TEX2..55852. THAP8..199745. _ USP41..373856. KLHL28..54813. _ TFPT..29844. MYEOV..26579. _ GPX8..493869.
dCMP _ OAZ1..4946. ZNF486..90649. _ ZCWPW1..55063. SH3D21..79729. _ TFAP2A..7020. thymine _ cytidine NXF1..10482. _ MRPL24..79590. ZNF829..374899. _ GALNT18..374378. TNFRSF10D..8793. _ ACCS..84680.
hexoses..HILIC.neg. _ cytidine ZNF486..90649. _ CAPN5..726. SH3D21..79729. _ EIF2AK3..9451. taurodeoxycholate.taurochenodeoxycholate _ thymine MYEOV..26579. _ MACROH2A2..55506. ZNF829..374899. _ USP41..373856. CCDC106..29903. _ LAMA4..3910.
lactate _ dCMP ZNF486..90649. _ THAP8..199745. cytidine _ EXOC1..55763. tryptophan _ X2.hydroxyglutarate MYEOV..26579. _ TIAM1..7074. SIPA1L1..26037. _ TFPT..29844. CCDC106..29903. _ TAPBPL..55080.
choline _ homocysteine cytidine _ TAPBPL..55080. cytidine _ KLF11..8462. anthranilic.acid _ HNRNPM..4670. TNFRSF10D..8793. _ P2RX5..5026. SIPA1L1..26037. _ KLF11..8462. TMEM107..84314. _ GLTP..51228.
acetylcholine _ ZNF286A..57335. cytidine _ CAPN5..726. uridine _ EIF2AK3..9451. anthranilic.acid _ glycine TNFRSF10D..8793. _ FXYD5..53827. RANBP17..64901. _ PHLDA1..22822. ZNF471..57573. _ DTNA..1837.
acetylcholine _ TVP23C.CDRT4..100533496. urate _ F1P.F6P.G1P.G6P thiamine _ adipate thiamine _ thymine TNFRSF10D..8793. _ ECHDC2..55268. SH3D21..79729. _ GALNT18..374378. ZNF471..57573. _ UCHL1..7345.
pipecolic.acid _ SYNE2..23224. X3.methyladipate.pimelate _ cytidine thiamine _ uridine adenosine _ FEM1A..55527. TNFRSF10D..8793. _ LRRC8C..84230. SH3D21..79729. _ NAAA..27163. CSF2RA..1438. _ GGACT..87769.
pipecolic.acid _ PIGT..51604. tryptophan _ SAAL1..113174. C18.2.LPC _ uridine X2.deoxyadenosine _ tryptophan TMEM17..200728. _ RHOF..54509. SH3D21..79729. _ UCHL1..7345. APRT..353. _ ITGAE..3682.
pipecolic.acid _ IGFLR1..79713. cis.trans.hydroxyproline _ tryptophan C18.2.LPC _ ornithine X2.deoxyadenosine _ anthranilic.acid ZNF471..57573. _ PLCB4..5332. SH3D21..79729. _ AUTS2..26053. S1PR3..1903. _ NHLRC1..378884.
pipecolic.acid _ hexoses..HILIC.neg. pyroglutamic.acid _ uridine C18.2.LPC _ thiamine methionine.sulfoxide _ cotinine ZNF471..57573. _ WNT5B..81029. SH3D21..79729. _ KLF11..8462. PET117..100303755. _ THAP8..199745.
pipecolic.acid _ taurodeoxycholate.taurochenodeoxycholate pyroglutamic.acid _ X3.methyladipate.pimelate C18.2.LPC _ anserine hexanoylcarnitine _ TRAPPC1..58485. ZNF471..57573. _ UCHL1..7345. C4orf48..401115. _ PHLDA1..22822. MAGEA2..4101. _ P2RX5..5026.
pipecolic.acid _ choline X1.methylnicotinamide _ cytidine C16.0.LPE _ methionine.sulfoxide hexanoylcarnitine _ CDC26..246184. S1PR3..1903. _ ZSCAN12..9753. C4orf48..401115. _ LRRC8E..80131. cystathionine _ CDC26..246184.
hexanoylcarnitine _ OAZ1..4946. valerylcarnitine.isovalerylcarnitine.2.methylbutyroylcarnitine _ GCSH..2653. C18.0.LPE _ methionine.sulfoxide beta.alanine _ TRAPPC1..58485. S1PR3..1903. _ ZNF382..84911. C8orf88..100127983. _ LY75..4065. cytidine _ adipate
hexanoylcarnitine _ CDC26..246184. valerylcarnitine.isovalerylcarnitine.2.methylbutyroylcarnitine _ PSTPIP2..9050. C32.0.PC _ ZFX..7543. C18.0.LPC _ X3.phosphoglycerate X3.phosphoglycerate _ MRPS2..51116. C8orf88..100127983. _ TFAP2A..7020. N.carbamoyl.beta.alanine _ phosphocreatine
hexanoylcarnitine _ N.carbamoyl.beta.alanine valerylcarnitine.isovalerylcarnitine.2.methylbutyroylcarnitine _ SAAL1..113174. C32.0.PC _ proline C18.0.LPC _ DHAP.glyceraldehyde.3P cytidine _ SERP1..27230. C8orf88..100127983. _ USP41..373856. thiamine _ cytidine
hexanoylcarnitine _ acetylcholine anserine _ X3.methyladipate.pimelate C32.0.PC _ methionine.sulfoxide C18.0.LPC _ beta.alanine uracil _ SERP1..27230. cystathionine _ CUL3..8452. choline _ cytidine
C32.0.PC _ OAZ1..4946. C20.4.LPC _ RPL18..6141. C40.6.PC _ cytidine C18.0.LPE _ TRAPPC1..58485. phosphocreatine _ ATP6V1H..51606. cystathionine _ NEK2..4751. acetylcholine _ ITGA6..3655.
C34.4.PC _ PPDPF..79144. C20.4.LPC _ F1P.F6P.G1P.G6P C40.6.PC _ ornithine C18.0.LPE _ tryptophan phosphocreatine _ MRPS2..51116. cytidine _ GNL1..2794. acetylcholine _ FUT10..84750.
C34.4.PC _ X4.pyridoxate C18.0.LPE _ C20.4.LPC C40.6.PC _ thiamine C18.0.LPE _ methionine.sulfoxide phosphocreatine _ RHOF..54509. sorbitol _ cytidine acetylcholine _ CDC26..246184.
C34.4.PC _ hexoses..HILIC.neg. C38.6.PC _ X3.methyladipate.pimelate C18.1.SM _ C16.0.LPE C20.4.LPE _ allantoin phosphocreatine _ TCF25..22980. succinate.methylmalonate _ cytidine hexanoylcarnitine _ CDC26..246184.
C34.4.PC _ pyroglutamic.acid C38.6.PC _ pyroglutamic.acid C34.2.DAG _ ZFX..7543. C32.0.PC _ methionine.sulfoxide phosphocreatine _ NOSIP..51070. phosphocreatine _ CDYL..9425. anserine _ cytidine
C36.1.DAG _ OAZ1..4946. C18.0.CE _ RPL18..6141. C34.2.DAG _ FUT10..84750. C34.4.PC _ X4.pyridoxate phosphocreatine _ TRAPPC2L..51693. phosphocreatine _ LRRC8E..80131. C18.0.LPE _ RPL18..6141.
C36.1.DAG _ hexanoylcarnitine C18.0.CE _ GCSH..2653. C18.0.CE _ PRDX3..10935. C34.4.PC _ allantoin asparagine _ TRAPPC2L..51693. niacinamide _ cytidine C18.0.LPE _ SEPHS1..22929.
C18.1.CE _ lactate C18.0.CE _ RFT1..91869. C18.0.CE _ proline C34.3.PC _ C18.0.LPC asparagine _ cystathionine arachidonyl_carnitine _ cytidine C34.4.PC _ adipate
C18.0.CE _ OAZ1..4946. C18.0.CE _ tryptophan C18.0.CE _ methionine.sulfoxide C36.1.DAG _ X2.aminoadipate thiamine _ uracil arachidonyl_carnitine _ pyroglutamic.acid C40.6.PC _ cytidine
C18.0.CE _ hexanoylcarnitine C48.3.TAG _ CAPN5..726. C18.0.CE _ C18.0.LPE C36.1.DAG _ tryptophan niacinamide _ SERP1..27230. C18.0.LPE _ SEPHS1..22929. C18.0.CE _ RPL18..6141.
C54.6.TAG _ pyroglutamic.acid C48.3.TAG _ IRS1..3667. C46.2.TAG _ uridine C18.1.CE _ C18.0.LPE butyrylcarnitine.isobutyrylcarnitine _ MRPL24..79590. C18.0.LPE _ TBP..6908. C18.0.CE _ APRT..353.
NA C48.3.TAG _ C38.6.PC C56.2.TAG _ ELP4..26610. C18.0.CE _ X2.deoxyadenosine C36.1.PC _ X3.phosphoglycerate C32.0.PC _ CUL3..8452. C18.0.CE _ C18.0.LPE
NA NA C56.2.TAG _ glycine C18.0.CE _ cotinine C36.1.PC _ hexanoylcarnitine C32.0.PC _ FIBP..9158. C58.8.TAG _ cytidine
NA NA C56.2.TAG _ C16.0.LPE C18.0.CE _ hexanoylcarnitine C40.6.PC _ uracil C24.1.SM _ C18.0.LPE NA
NA NA NA C22.6.CE _ C18.0.LPE C18.0.CE _ C22.0.SM NA NA

Statistical Model

從上圖可以看出反應變數中有特別多的0,如果配飾一般的count model,會遇到無法擬合的問題,因此在接下來建模的過程中,我會針對count response配適hurdle model(Feng 2021)

Hurdle model是一個可以處理資料中反應變數為count且有零膨脹問題的模型,因此針對我們的count response會配適這樣的模型,Hurdle model中有兩個重要的假設:

  • 假設0來自一個系統性地來源
  • 非零的觀測值來自不同的分配

因此我們需要有一部分的變數來預測是否為0,而另一部分的變數處理轉移到幾個部位。

接下來的部分會分成四個部分,

  • Hurdle Model(Lasso)
  • Logistic(Lasso)
  • Hurdle Model(p-value)
  • Logistic(p-value)

前兩部分是在使用lasso所挑選的變數放在hurdle model中和logistic regression中分析。後兩部分是使用p-value所篩選出來的變數,並放在同樣的兩個模型中分析。

hurdle model中,反應變數放的是每個cell line轉移到幾個部位。而logistic regression中放的是每個cell line針對各個部位是否有轉移,因此會比較五個部位的結果。

Hurdle Model(Lasso)

output

Call:
hurdle(formula = as.formula(paste("count_response_label~", paste(selected_variables_count[2:length(selected_variables_count)], 
    collapse = "+"), "|", paste(selected_variables_binary[2:length(selected_variables_binary)], 
    collapse = "+"))), data = df.analysis)

Pearson residuals:
      Min        1Q    Median        3Q       Max 
-1.880433 -0.277309 -0.001546  0.301803  3.484351 

Count model coefficients (truncated poisson with log link):
                                               Estimate Std. Error z value
(Intercept)                                   0.7785805  2.3989297   0.325
`ABCA7..10347. _ CD44..960.`                  0.0050204  0.0065063   0.772
`IFT74..80173. _ P2RX5..5026.`                0.0096181  0.0162809   0.591
`TFPT..29844. _ IFRD1..3475.`                 0.0036886  0.0141106   0.261
`ACCS..84680. _ ITGA6..3655.`                 0.0161465  0.0122927   1.313
`LYST..1130. _ IFRD1..3475.`                 -0.0212901  0.0169249  -1.258
`NOTCH1..4851. _ P2RX5..5026.`               -0.0022409  0.0204728  -0.109
`NOTCH1..4851. _ APOC1..341.`                -0.0034622  0.0115169  -0.301
`CAPN5..726. _ TFAP2A..7020.`                -0.0259128  0.0125300  -2.068
`PSTPIP2..9050. _ P2RX5..5026.`               0.0037162  0.0173216   0.215
`UCHL1..7345. _ LY75..4065.`                  0.0004349  0.0093209   0.047
`UCHL1..7345. _ LAMA4..3910.`                -0.0068863  0.0073674  -0.935
`UCHL1..7345. _ TFAP2A..7020.`                0.0013931  0.0058323   0.239
`ZSCAN12..9753. _ UCHL1..7345.`               0.0005454  0.0120434   0.045
`RFT1..91869. _ ITGAE..3682.`                 0.0092511  0.0195097   0.474
`GPX8..493869. _ ABCA7..10347.`              -0.0067645  0.0121870  -0.555
`GPX8..493869. _ LIPE..3991.`                 0.0027096  0.0124456   0.218
`GPX8..493869. _ NOTCH1..4851.`               0.0406923  0.0164676   2.471
`SAAL1..113174. _ GLTP..51228.`              -0.0125045  0.0215288  -0.581
`MYEOV..26579. _ GPX8..493869.`               0.0086645  0.0062252   1.392
`TNFRSF10D..8793. _ ACCS..84680.`             0.0079090  0.0124506   0.635
`CCDC106..29903. _ LAMA4..3910.`             -0.0020569  0.0158107  -0.130
`CCDC106..29903. _ TAPBPL..55080.`            0.0013778  0.0148532   0.093
`TMEM107..84314. _ GLTP..51228.`             -0.0104114  0.0170506  -0.611
`ZNF471..57573. _ DTNA..1837.`               -0.0011511  0.0344650  -0.033
`ZNF471..57573. _ UCHL1..7345.`              -0.0148002  0.0138220  -1.071
`CSF2RA..1438. _ GGACT..87769.`              -0.0051651  0.0151337  -0.341
`APRT..353. _ ITGAE..3682.`                   0.0129031  0.0186620   0.691
`S1PR3..1903. _ NHLRC1..378884.`             -0.0358895  0.0296372  -1.211
`PET117..100303755. _ THAP8..199745.`        -0.0085761  0.0266189  -0.322
`MAGEA2..4101. _ P2RX5..5026.`                0.0162396  0.0074052   2.193
`cystathionine _ CDC26..246184.`             -0.0111638  0.0190426  -0.586
`cytidine _ adipate`                          0.0216246  0.0633536   0.341
`N.carbamoyl.beta.alanine _ phosphocreatine`  0.0044950  0.0138967   0.323
`thiamine _ cytidine`                         0.0135914  0.0399823   0.340
`choline _ cytidine`                         -0.0035972  0.0310990  -0.116
`acetylcholine _ ITGA6..3655.`               -0.0085518  0.0113703  -0.752
`acetylcholine _ FUT10..84750.`               0.0301716  0.0180352   1.673
`acetylcholine _ CDC26..246184.`              0.0011361  0.0186859   0.061
`hexanoylcarnitine _ CDC26..246184.`          0.0129695  0.0266745   0.486
`anserine _ cytidine`                        -0.0587707  0.0560224  -1.049
`C18.0.LPE _ RPL18..6141.`                    0.0146166  0.0216357   0.676
`C18.0.LPE _ SEPHS1..22929.`                  0.0121880  0.0212721   0.573
`C34.4.PC _ adipate`                         -0.0907583  0.0450530  -2.014
`C40.6.PC _ cytidine`                        -0.0166182  0.0474962  -0.350
`C18.0.CE _ RPL18..6141.`                    -0.0001464  0.0202620  -0.007
`C18.0.CE _ APRT..353.`                       0.0131946  0.0233526   0.565
`C18.0.CE _ C18.0.LPE`                        0.0168788  0.0369485   0.457
`C58.8.TAG _ cytidine`                        0.0285728  0.0276685   1.033
                                             Pr(>|z|)  
(Intercept)                                    0.7455  
`ABCA7..10347. _ CD44..960.`                   0.4403  
`IFT74..80173. _ P2RX5..5026.`                 0.5547  
`TFPT..29844. _ IFRD1..3475.`                  0.7938  
`ACCS..84680. _ ITGA6..3655.`                  0.1890  
`LYST..1130. _ IFRD1..3475.`                   0.2084  
`NOTCH1..4851. _ P2RX5..5026.`                 0.9128  
`NOTCH1..4851. _ APOC1..341.`                  0.7637  
`CAPN5..726. _ TFAP2A..7020.`                  0.0386 *
`PSTPIP2..9050. _ P2RX5..5026.`                0.8301  
`UCHL1..7345. _ LY75..4065.`                   0.9628  
`UCHL1..7345. _ LAMA4..3910.`                  0.3499  
`UCHL1..7345. _ TFAP2A..7020.`                 0.8112  
`ZSCAN12..9753. _ UCHL1..7345.`                0.9639  
`RFT1..91869. _ ITGAE..3682.`                  0.6354  
`GPX8..493869. _ ABCA7..10347.`                0.5789  
`GPX8..493869. _ LIPE..3991.`                  0.8277  
`GPX8..493869. _ NOTCH1..4851.`                0.0135 *
`SAAL1..113174. _ GLTP..51228.`                0.5614  
`MYEOV..26579. _ GPX8..493869.`                0.1640  
`TNFRSF10D..8793. _ ACCS..84680.`              0.5253  
`CCDC106..29903. _ LAMA4..3910.`               0.8965  
`CCDC106..29903. _ TAPBPL..55080.`             0.9261  
`TMEM107..84314. _ GLTP..51228.`               0.5415  
`ZNF471..57573. _ DTNA..1837.`                 0.9734  
`ZNF471..57573. _ UCHL1..7345.`                0.2843  
`CSF2RA..1438. _ GGACT..87769.`                0.7329  
`APRT..353. _ ITGAE..3682.`                    0.4893  
`S1PR3..1903. _ NHLRC1..378884.`               0.2259  
`PET117..100303755. _ THAP8..199745.`          0.7473  
`MAGEA2..4101. _ P2RX5..5026.`                 0.0283 *
`cystathionine _ CDC26..246184.`               0.5577  
`cytidine _ adipate`                           0.7329  
`N.carbamoyl.beta.alanine _ phosphocreatine`   0.7463  
`thiamine _ cytidine`                          0.7339  
`choline _ cytidine`                           0.9079  
`acetylcholine _ ITGA6..3655.`                 0.4520  
`acetylcholine _ FUT10..84750.`                0.0943 .
`acetylcholine _ CDC26..246184.`               0.9515  
`hexanoylcarnitine _ CDC26..246184.`           0.6268  
`anserine _ cytidine`                          0.2942  
`C18.0.LPE _ RPL18..6141.`                     0.4993  
`C18.0.LPE _ SEPHS1..22929.`                   0.5667  
`C34.4.PC _ adipate`                           0.0440 *
`C40.6.PC _ cytidine`                          0.7264  
`C18.0.CE _ RPL18..6141.`                      0.9942  
`C18.0.CE _ APRT..353.`                        0.5721  
`C18.0.CE _ C18.0.LPE`                         0.6478  
`C58.8.TAG _ cytidine`                         0.3018  
Zero hurdle model coefficients (binomial with logit link):
                                              Estimate Std. Error z value
(Intercept)                                 -65.831516  41.922696  -1.570
`ST6GAL1..6480. _ PTPRU..10076.`              0.785954   0.537187   1.463
`TFPT..29844. _ SYNE2..23224.`                0.106082   0.226106   0.469
`ELK3..2004. _ PTPRU..10076.`                 0.534136   0.373105   1.432
`TBP..6908. _ PDE8A..5151.`                   0.259682   0.647792   0.401
`NAAA..27163. _ TRIM2..23321.`               -0.105808   0.168153  -0.629
`NAAA..27163. _ LAMA4..3910.`                -0.121971   0.184572  -0.661
`PSTPIP2..9050. _ ST6GAL1..6480.`            -0.133492   0.196254  -0.680
`UCHL1..7345. _ LY75..4065.`                 -0.123323   0.116142  -1.062
`GPSM1..26086. _ NLRP1..22861.`              -0.329695   0.324790  -1.015
`GPX8..493869. _ HFE..3077.`                 -0.008244   0.168879  -0.049
`GPX8..493869. _ PTPRU..10076.`              -0.756753   0.570139  -1.327
`GPX8..493869. _ PHKA1..5255.`                0.776258   0.525896   1.476
`GPX8..493869. _ MDFI..4188.`                 0.439733   0.298310   1.474
`LRRC8E..80131. _ PDE8A..5151.`               0.242151   0.761866   0.318
`KLF11..8462. _ TFPT..29844.`                -0.230128   0.427365  -0.538
`MSRA..4482. _ PHKA1..5255.`                 -0.482016   0.432332  -1.115
`MSRA..4482. _ NLRP1..22861.`                 0.509924   0.360161   1.416
`TMEM107..84314. _ GPX8..493869.`             0.208776   0.284030   0.735
`KLHL28..54813. _ TFPT..29844.`              -0.828348   0.718041  -1.154
`ZNF829..374899. _ GALNT18..374378.`          0.251680   0.613884   0.410
`ZNF829..374899. _ USP41..373856.`           -0.568860   0.549050  -1.036
`SIPA1L1..26037. _ TFPT..29844.`              0.050209   0.340648   0.147
`SIPA1L1..26037. _ KLF11..8462.`             -0.360363   0.514857  -0.700
`RANBP17..64901. _ PHLDA1..22822.`            0.351907   0.262791   1.339
`SH3D21..79729. _ GALNT18..374378.`          -0.370852   0.270142  -1.373
`SH3D21..79729. _ NAAA..27163.`              -0.326184   0.296255  -1.101
`SH3D21..79729. _ UCHL1..7345.`              -0.034603   0.078848  -0.439
`SH3D21..79729. _ AUTS2..26053.`              0.114889   0.219981   0.522
`SH3D21..79729. _ KLF11..8462.`               0.286212   0.635670   0.450
`C4orf48..401115. _ PHLDA1..22822.`           0.313171   0.240991   1.300
`C4orf48..401115. _ LRRC8E..80131.`          -0.267587   0.303654  -0.881
`C8orf88..100127983. _ LY75..4065.`          -1.242647   0.931264  -1.334
`C8orf88..100127983. _ TFAP2A..7020.`         0.396705   0.383993   1.033
`C8orf88..100127983. _ USP41..373856.`       -0.385032   0.343129  -1.122
`cystathionine _ CUL3..8452.`                 1.075081   0.598631   1.796
`cystathionine _ NEK2..4751.`                -0.074744   0.235208  -0.318
`cytidine _ GNL1..2794.`                     -1.349332   0.716755  -1.883
`sorbitol _ cytidine`                         0.832421   0.660956   1.259
`succinate.methylmalonate _ cytidine`         0.514038   0.991136   0.519
`phosphocreatine _ CDYL..9425.`               0.183228   0.263915   0.694
`phosphocreatine _ LRRC8E..80131.`            0.860374   0.777217   1.107
`niacinamide _ cytidine`                     -0.781648   0.599584  -1.304
`arachidonyl_carnitine _ cytidine`           -0.167615   0.510504  -0.328
`arachidonyl_carnitine _ pyroglutamic.acid`  -0.076275   0.530937  -0.144
`C18.0.LPE _ SEPHS1..22929.`                  0.396963   0.403481   0.984
`C18.0.LPE _ TBP..6908.`                     -0.458676   0.528661  -0.868
`C32.0.PC _ CUL3..8452.`                     -0.767100   0.582971  -1.316
`C32.0.PC _ FIBP..9158.`                      0.949049   0.530851   1.788
`C24.1.SM _ C18.0.LPE`                        0.843194   0.503819   1.674
                                            Pr(>|z|)  
(Intercept)                                   0.1163  
`ST6GAL1..6480. _ PTPRU..10076.`              0.1434  
`TFPT..29844. _ SYNE2..23224.`                0.6389  
`ELK3..2004. _ PTPRU..10076.`                 0.1523  
`TBP..6908. _ PDE8A..5151.`                   0.6885  
`NAAA..27163. _ TRIM2..23321.`                0.5292  
`NAAA..27163. _ LAMA4..3910.`                 0.5087  
`PSTPIP2..9050. _ ST6GAL1..6480.`             0.4964  
`UCHL1..7345. _ LY75..4065.`                  0.2883  
`GPSM1..26086. _ NLRP1..22861.`               0.3101  
`GPX8..493869. _ HFE..3077.`                  0.9611  
`GPX8..493869. _ PTPRU..10076.`               0.1844  
`GPX8..493869. _ PHKA1..5255.`                0.1399  
`GPX8..493869. _ MDFI..4188.`                 0.1405  
`LRRC8E..80131. _ PDE8A..5151.`               0.7506  
`KLF11..8462. _ TFPT..29844.`                 0.5902  
`MSRA..4482. _ PHKA1..5255.`                  0.2649  
`MSRA..4482. _ NLRP1..22861.`                 0.1568  
`TMEM107..84314. _ GPX8..493869.`             0.4623  
`KLHL28..54813. _ TFPT..29844.`               0.2487  
`ZNF829..374899. _ GALNT18..374378.`          0.6818  
`ZNF829..374899. _ USP41..373856.`            0.3002  
`SIPA1L1..26037. _ TFPT..29844.`              0.8828  
`SIPA1L1..26037. _ KLF11..8462.`              0.4840  
`RANBP17..64901. _ PHLDA1..22822.`            0.1805  
`SH3D21..79729. _ GALNT18..374378.`           0.1698  
`SH3D21..79729. _ NAAA..27163.`               0.2709  
`SH3D21..79729. _ UCHL1..7345.`               0.6608  
`SH3D21..79729. _ AUTS2..26053.`              0.6015  
`SH3D21..79729. _ KLF11..8462.`               0.6525  
`C4orf48..401115. _ PHLDA1..22822.`           0.1938  
`C4orf48..401115. _ LRRC8E..80131.`           0.3782  
`C8orf88..100127983. _ LY75..4065.`           0.1821  
`C8orf88..100127983. _ TFAP2A..7020.`         0.3016  
`C8orf88..100127983. _ USP41..373856.`        0.2618  
`cystathionine _ CUL3..8452.`                 0.0725 .
`cystathionine _ NEK2..4751.`                 0.7507  
`cytidine _ GNL1..2794.`                      0.0598 .
`sorbitol _ cytidine`                         0.2079  
`succinate.methylmalonate _ cytidine`         0.6040  
`phosphocreatine _ CDYL..9425.`               0.4875  
`phosphocreatine _ LRRC8E..80131.`            0.2683  
`niacinamide _ cytidine`                      0.1924  
`arachidonyl_carnitine _ cytidine`            0.7427  
`arachidonyl_carnitine _ pyroglutamic.acid`   0.8858  
`C18.0.LPE _ SEPHS1..22929.`                  0.3252  
`C18.0.LPE _ TBP..6908.`                      0.3856  
`C32.0.PC _ CUL3..8452.`                      0.1882  
`C32.0.PC _ FIBP..9158.`                      0.0738 .
`C24.1.SM _ C18.0.LPE`                        0.0942 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Number of iterations in BFGS optimization: 59 
Log-likelihood: -267.6 on 99 Df

報表的上半部分為針對count類型的變數所配適模型的結果,下半部分為針對是否為0所配適模型的結果。 可以看到,僅有CAPN5..726. _ TFAP2A..7020.CAPN5..726. _ TFAP2A..7020.MAGEA2..4101. _ P2RX5..5026.C34.4.PC _ adipate是顯著的。

我們進一步檢查MSE來看看模型配適的好壞與否。

分析結果

MSE variance
0.7633929 3.831839

從MSE來看,這個模型配適的並不差。

Logistic(Lasso)

以下報表整理了五個器官針對各個變數的係數,()內的數字是standard error,*號代表此變數p-value<0.05,若某變數針對某部位是空格,代表在篩選變數階段,此變數就已經被篩選掉了。

我們可以從表格中看到,在篩選變數階段,五個部位篩選出來的變數就有明顯的不同,最多只有兩兩重複的變數,在進一步看顯著性,可以看出,沒有任何一個變數對於五個部位的轉移是有重複的,因此可以看出五個部位的轉移與否並沒有共用的變因。

Dependent variable:
bone brain kidney liver lung
(1) (2) (3) (4) (5)
EHHADH..1962. LAMA4..3910. -0.195*
(0.115)
IGFLR1..79713. LMCD1..29995. 0.149
(0.252)
IGFLR1..79713. PIGT..51604. 0.231
(0.411)
RAB15..376267. PPDPF..79144. -0.046
(0.068)
RAB15..376267. DTNA..1837. 0.018
(0.132)
LYST..1130. EHHADH..1962. -0.251
(0.175)
CIART..148523. LAMA4..3910. -0.174
(0.119)
CIART..148523. RAB15..376267. 0.018
(0.183)
THAP8..199745. SYNE2..23224. 0.435*
(0.246)
THAP8..199745. CIART..148523. -0.629*
(0.341)
SMIM14..201895. DTNA..1837. -0.097
(0.117)
SAAL1..113174. P2RX5..5026. 0.237***
(0.076)
CCDC106..29903. LMCD1..29995. -0.192
(0.182)
SESTD1..91404. IGFLR1..79713. -0.180
(0.236)
SESTD1..91404. CCDC106..29903. -0.143
(0.153)
ZNF286A..57335. OAZ1..4946. 0.589
(0.431)
ZNF418..147686. CCDC106..29903. -0.142
(0.132)
TVP23C.CDRT4..100533496. SAAL1..113174. -0.360
(0.298)
TVP23C.CDRT4..100533496. CDC26..246184. 0.477
(0.573)
dCMP OAZ1..4946. 0.308
(0.376)
hexoses..HILIC.neg. cytidine -0.322*
(0.167)
lactate dCMP 0.043
(0.543)
choline homocysteine -0.391*
(0.212)
acetylcholine ZNF286A..57335. -0.672
(0.712)
acetylcholine TVP23C.CDRT4..100533496. 0.008
(0.480)
pipecolic.acid SYNE2..23224. -0.100
(0.107)
pipecolic.acid PIGT..51604. -0.272
(0.205)
pipecolic.acid IGFLR1..79713. -0.534
(0.482)
pipecolic.acid hexoses..HILIC.neg. 0.127
(0.415)
pipecolic.acid taurodeoxycholate.taurochenodeoxycholate -0.009
(0.085)
pipecolic.acid choline 0.164
(0.263)
hexanoylcarnitine OAZ1..4946. -0.228
(0.348)
ELP4..26610. P2RX5..5026. 2.900
(185,829.600)
EXOC2..55770. TYRO3..7301. 9.431
(108,422.700)
RTN4IP1..84816. TYRO3..7301. -4.570
(274,060.700)
TEX2..55852. EHHADH..1962. 8.506
(469,437.500)
FEM1A..55527. SPRY2..10253. 2.378
(272,152.200)
UCHL1..7345. TEX2..55852. -4.717
(63,658.730)
NOS3..4846. P2RX5..5026. 15.001
(170,478.000)
HDX..139324. PIK3IP1..113791. -1.861
(591,306.800)
TRAPPC1..58485. ELP4..26610. 5.911
(239,979.500)
TNFRSF10D..8793. FEM1A..55527. 5.931
(304,849.300)
PHYHD1..254295. FEM1A..55527. 48.677
(1,004,186.000)
TMEM107..84314. EXOC2..55770. 4.319
(318,894.600)
ARPIN..348110. FEM1A..55527. 16.912
(1,200,561.000)
ARPIN..348110. PHYHD1..254295. -10.876
(176,313.600)
X4.pyridoxate alpha.glycerophosphate -9.870
(230,769.800)
cytidine TEX2..55852. -51.668
(878,007.700)
cytidine X4.pyridoxate 24.123
(810,841.400)
sorbitol cytidine -5.739
(1,564,586.000)
thymine TEX2..55852. 33.753
(917,168.500)
thymine cytidine -0.009
(982,577.800)
taurodeoxycholate.taurochenodeoxycholate thymine -7.464
(146,189.300)
tryptophan X2.hydroxyglutarate -6.877
(362,236.900)
anthranilic.acid HNRNPM..4670. -5.842
(205,428.600)
anthranilic.acid glycine 2.402
(491,962.700)
thiamine thymine -12.060
(552,097.200)
adenosine FEM1A..55527. 19.910
(457,603.600)
X2.deoxyadenosine tryptophan -133.569
(4,993,464.000)
X2.deoxyadenosine anthranilic.acid 46.878
(253,311.500)
methionine.sulfoxide cotinine 86.805
(1,967,290.000)
hexanoylcarnitine TRAPPC1..58485. 32.774
(4,653,879.000)
hexanoylcarnitine CDC26..246184. -0.005 23.433
(0.308) (106,126.200)
hexanoylcarnitine N.carbamoyl.beta.alanine 0.283**
(0.120)
hexanoylcarnitine acetylcholine 0.651
(0.427)
C32.0.PC OAZ1..4946. -0.082
(0.189)
C34.4.PC PPDPF..79144. -0.031
(0.101)
beta.alanine TRAPPC1..58485. 28.258
(247,574.600)
C18.0.LPC X3.phosphoglycerate 9.871
(805,787.000)
C18.0.LPC DHAP.glyceraldehyde.3P 12.634
(803,323.800)
C18.0.LPC beta.alanine -30.668
(831,919.000)
C18.0.LPE TRAPPC1..58485. -56.018
(4,606,700.000)
C18.0.LPE tryptophan 150.947
(4,727,068.000)
C34.4.PC X4.pyridoxate -0.196 -50.472
(0.223) (1,166,273.000)
C34.4.PC hexoses..HILIC.neg. -0.177
(0.470)
C34.4.PC pyroglutamic.acid -0.236
(0.371)
C36.1.DAG OAZ1..4946. -2.027
(1.895)
C36.1.DAG hexanoylcarnitine 3.899
(3.036)
C18.1.CE lactate 0.057
(0.456)
C18.0.CE OAZ1..4946. 2.217
(1.880)
C34.4.PC allantoin 12.810
(1,469,676.000)
C34.3.PC C18.0.LPC 45.228
(1,289,816.000)
C36.1.DAG X2.aminoadipate -0.018
(780,216.600)
C36.1.DAG tryptophan 8.553
(352,427.300)
C18.1.CE C18.0.LPE 26.907
(704,041.600)
C18.0.CE X2.deoxyadenosine 99.024
(5,011,904.000)
C18.0.CE cotinine -61.681
(1,890,961.000)
C18.0.CE hexanoylcarnitine -3.447 -40.969
(3.044) (5,418,910.000)
C54.6.TAG pyroglutamic.acid 0.062
(0.225)
TAPBPL..55080. ZCWPW1..55063. 0.183
(0.371)
PSTPIP2..9050. TNS4..84951. 0.052
(0.072)
PLCL2..23228. TNS4..84951. 0.148
(0.140)
PLCL2..23228. GCSH..2653. 0.148*
(0.088)
RFT1..91869. RPL18..6141. 0.258
(0.455)
NNMT..4837. CAPN5..726. -0.062
(0.108)
NNMT..4837. MAGED4..728239. -0.170**
(0.084)
NNMT..4837. THAP8..199745. 0.081
(0.118)
SAAL1..113174. EML2..24139. 0.190
(0.156)
ZNF608..57507. CAPN5..726. -0.147
(0.159)
IRS1..3667. ZCWPW1..55063. -0.416
(0.354)
IRS1..3667. TAPBPL..55080. 0.344
(0.274)
IRS1..3667. ZNF608..57507. -0.035
(0.145)
MYEOV..26579. SAAL1..113174. 0.101**
(0.040)
C3orf62..375341. EML2..24139. 0.322**
(0.158)
C3orf62..375341. PSTPIP2..9050. -0.290
(0.332)
TCEAL3..85012. ZCWPW1..55063. 0.112
(0.171)
TCEAL3..85012. CAPN5..726. 0.011
(0.122)
TCEAL3..85012. NNMT..4837. -0.020
(0.058)
ZNF486..90649. ZCWPW1..55063. -0.383
(0.515)
ZNF486..90649. CAPN5..726. -0.103
(0.219)
ZNF486..90649. THAP8..199745. 0.187
(0.397)
cytidine TAPBPL..55080. -0.336*
(0.203)
cytidine CAPN5..726. 0.146
(0.412)
urate F1P.F6P.G1P.G6P 0.334
(0.257)
X3.methyladipate.pimelate cytidine -0.340
(0.274)
tryptophan SAAL1..113174. -0.628
(0.763)
cis.trans.hydroxyproline tryptophan 0.322*
(0.178)
pyroglutamic.acid uridine -0.193
(0.171)
pyroglutamic.acid X3.methyladipate.pimelate 0.026
(0.488)
X1.methylnicotinamide cytidine -0.098
(0.072)
valerylcarnitine.isovalerylcarnitine.2.methylbutyroylcarnitine GCSH..2653. -0.663
(0.636)
valerylcarnitine.isovalerylcarnitine.2.methylbutyroylcarnitine PSTPIP2..9050. 0.175
(0.138)
valerylcarnitine.isovalerylcarnitine.2.methylbutyroylcarnitine SAAL1..113174. 0.620
(0.709)
anserine X3.methyladipate.pimelate -0.287
(0.319)
C20.4.LPC RPL18..6141. 0.332
(0.270)
C20.4.LPC F1P.F6P.G1P.G6P -0.061
(0.294)
C18.0.LPE C20.4.LPC -0.049
(0.216)
C38.6.PC X3.methyladipate.pimelate -0.315
(0.365)
C38.6.PC pyroglutamic.acid -0.125
(0.491)
C18.0.CE RPL18..6141. -0.462
(0.429)
C18.0.CE GCSH..2653. 0.770
(0.661)
C18.0.CE RFT1..91869. -0.334
(0.790)
C18.0.CE tryptophan 0.568
(0.646)
C48.3.TAG CAPN5..726. -0.183
(0.398)
C48.3.TAG IRS1..3667. -0.082
(0.167)
C48.3.TAG C38.6.PC -0.122
(0.247)
CD44..960. ZFX..7543. -0.101
(0.251)
LIPE..3991. CD44..960. 0.138
(0.096)
P2RX5..5026. LIPE..3991. 0.130
(0.155)
ELP4..26610. CD44..960. 0.139
(0.228)
SLCO1B3..28234. P2RX5..5026. 0.164*
(0.094)
TENT5A..55603. LIPE..3991. -0.376**
(0.175)
ZNF513..130557. SLC38A5..92745. -0.084
(0.084)
SCARA3..51435. LIPE..3991. 0.138
(0.159)
LRRC8E..80131. ZFX..7543. 0.077
(0.205)
FUT10..84750. SCARA3..51435. -0.030
(0.271)
ZNF548..147694. ZNF513..130557. -0.322**
(0.134)
ZNF548..147694. SCARA3..51435. 0.188
(0.225)
ZNF471..57573. TENT5A..55603. 0.094
(0.387)
ZNF471..57573. SERPINB9..5272. 0.487
(0.309)
ZNF471..57573. FUT10..84750. -0.867**
(0.421)
TOR4A..54863. APOC1..341. 0.041
(0.086)
TOR4A..54863. SERPINB9..5272. -0.072
(0.101)
TOR4A..54863. LRRC8E..80131. 0.170
(0.136)
IRF9..10379. ZNF548..147694. -0.022
(0.113)
SH3D21..79729. TFAP2A..7020. -0.103
(0.099)
SH3D21..79729. EIF2AK3..9451. -0.175
(0.153)
cytidine EXOC1..55763. -0.191
(0.167)
cytidine KLF11..8462. -0.287**
(0.118)
uridine EIF2AK3..9451. -0.021
(0.167)
thiamine adipate 0.326
(0.522)
thiamine uridine -0.453
(0.434)
C18.2.LPC uridine 0.395
(0.450)
C18.2.LPC ornithine 2.873
(3.190)
C18.2.LPC thiamine -4.020
(3.380)
C18.2.LPC anserine 0.038
(0.368)
C16.0.LPE methionine.sulfoxide 0.334
(0.640)
C22.6.CE C18.0.LPE 0.980
(812,322.300)
C18.0.LPE methionine.sulfoxide 0.665 -77.662
(0.564) (1,612,247.000)
C32.0.PC ZFX..7543. 0.584
(0.435)
C32.0.PC proline -6.011*
(3.556)
C20.4.LPE allantoin -60.370
(1,776,697.000)
C32.0.PC methionine.sulfoxide 6.039* 6.315
(3.548) (498,688.600)
C40.6.PC cytidine -0.310
(0.241)
C40.6.PC ornithine -3.343
(3.234)
C40.6.PC thiamine 3.553
(3.436)
C18.1.SM C16.0.LPE 0.541
(0.421)
C34.2.DAG ZFX..7543. -0.121
(0.308)
C34.2.DAG FUT10..84750. 0.874***
(0.248)
C18.0.CE PRDX3..10935. 0.255*
(0.155)
C18.0.CE proline 5.984*
(3.521)
C18.0.CE methionine.sulfoxide -5.721
(3.479)
C18.0.CE C18.0.LPE -0.191
(0.600)
C46.2.TAG uridine -0.259
(0.214)
C56.2.TAG ELP4..26610. -0.054
(0.301)
C56.2.TAG glycine 0.138
(0.290)
C56.2.TAG C16.0.LPE -0.034
(0.420)
SLC1A5..6510. ITGAE..3682. 14.622
(107,726.800)
COL12A1..1303. WNT5B..81029. -24.725
(64,568.800)
SERP1..27230. CDIPT..10423. 11.917
(131,562.200)
ECHDC2..55268. FXYD5..53827. 4.804
(41,182.130)
EXOSC9..5393. SLC1A5..6510. 1.345
(215,873.100)
RHOF..54509. DHODH..1723. -60.737
(212,771.100)
RHOF..54509. EXOSC9..5393. 22.200
(361,339.100)
GCSH..2653. SLC1A5..6510. 1.775
(145,703.900)
GCSH..2653. ARHGEF2..9181. 17.874
(271,983.600)
LYST..1130. FBXL16..146330. 2.217
(99,482.960)
UCHL1..7345. TNFRSF1B..7133. -1.273
(54,620.290)
UCHL1..7345. LYST..1130. -9.335
(103,855.400)
ZSCAN12..9753. TNFRSF1B..7133. -35.454
(203,622.900)
ZSCAN12..9753. PLCB4..5332. -18.249
(408,093.900)
ZSCAN12..9753. UCHL1..7345. -9.320
(259,307.800)
CBS..875. FXYD5..53827. 13.665
(54,057.940)
CBS..875. SLCO1B3..28234. 12.703
(94,593.890)
THAP8..199745. ZSCAN12..9753. 3.321
(671,903.000)
THAP8..199745. USP41..373856. -27.541
(319,673.300)
NXF1..10482. MRPL24..79590. -1.004
(194,257.600)
MYEOV..26579. MACROH2A2..55506. 2.008
(27,009.070)
MYEOV..26579. TIAM1..7074. 3.822
(55,854.160)
TNFRSF10D..8793. P2RX5..5026. 2.050
(106,126.900)
TNFRSF10D..8793. FXYD5..53827. -1.804
(79,994.880)
TNFRSF10D..8793. ECHDC2..55268. 8.611
(128,741.800)
TNFRSF10D..8793. LRRC8C..84230. 30.142
(88,929.540)
TMEM17..200728. RHOF..54509. 31.576
(182,970.400)
ZNF471..57573. PLCB4..5332. -62.844
(599,287.900)
ZNF471..57573. WNT5B..81029. 52.293
(417,750.200)
ZNF471..57573. UCHL1..7345. -18.551
(61,398.640)
S1PR3..1903. ZSCAN12..9753. 0.950
(733,295.300)
S1PR3..1903. ZNF382..84911. -22.470
(443,563.200)
X3.phosphoglycerate MRPS2..51116. 69.860
(594,351.300)
cytidine SERP1..27230. -2.838
(113,041.600)
uracil SERP1..27230. -53.893
(680,687.400)
phosphocreatine ATP6V1H..51606. 24.885
(323,489.400)
phosphocreatine MRPS2..51116. -27.071
(834,676.800)
phosphocreatine RHOF..54509. 2.182
(331,227.800)
phosphocreatine TCF25..22980. 9.911
(240,858.900)
phosphocreatine NOSIP..51070. 23.374
(354,943.300)
phosphocreatine TRAPPC2L..51693. -11.723
(537,111.700)
asparagine TRAPPC2L..51693. 34.992
(401,269.500)
asparagine cystathionine 5.349
(269,326.700)
thiamine uracil -12.193
(377,754.800)
niacinamide SERP1..27230. 24.495
(444,739.400)
butyrylcarnitine.isobutyrylcarnitine MRPL24..79590. -22.548
(199,257.900)
C36.1.PC X3.phosphoglycerate -13.117
(263,980.700)
C36.1.PC hexanoylcarnitine 74.308
(309,206.100)
C40.6.PC uracil 10.083
(582,592.500)
C18.0.CE C22.0.SM 3.623
(768,483.200)
Constant -15.683 5.721 -39.597 -2,271.898 -6,084.995
(21.662) (19.707) (26.147) (57,002,180.000) (26,862,734.000)
Observations 224 224 224 224 224
Log Likelihood -42.997 -50.145 -43.326 -0.000 -0.000
Akaike Inf. Crit. 179.993 196.291 186.652 102.000 102.000
Note: p<0.1; p<0.05; p<0.01

AUC

bone brain liver lung kidney
0.9775354 0.9703058 1 1 0.9788322

在liver和lung的auc上,都為1,因為運算問題,在使用glm模型時,程式會回傳”waring:fitted probabilities numerically 0 or 1 occurred”(Bobbitt 2024),這個問題需要我們重新挑選變數才能解決,但是在這邊,我們進一步的切割train-test來看看模型是否有效。

Accuracy

在這裏,我們按照反應變數的比例把train:test切割成8:2,來檢驗我們模型的accuracy。

bone brain liver lung kidney
0.8 0.7173913 0.8695652 0.7391304 0.7173913

接下來的兩個小節為使用p-value所挑出的變數以及其分析結果。呈現方式和上述使用lasso所挑出的變數所呈現的方一致。

Hurdle(p-value)

output

Call:
hurdle(formula = as.formula(paste("count_response_label~", paste(selected_variables, 
    collapse = "+"), "|", paste(selected_variables, collapse = "+"))), 
    data = organ_candidate)

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-1.6869 -0.7209 -0.1228  0.6756  2.2963 

Count model coefficients (truncated poisson with log link):
                                     Estimate Std. Error z value Pr(>|z|)   
(Intercept)                         9.746e-01  7.981e+00   0.122  0.90281   
AMIGO1..57463.                     -2.331e+00  2.694e+00  -0.865  0.38680   
NECAB3..63941.                      5.799e-01  1.194e+00   0.486  0.62726   
SMAP1..60682.                       3.440e-01  1.303e+00   0.264  0.79181   
SEPTIN5..5413.                     -1.239e-01  2.633e-01  -0.471  0.63790   
MAPK8IP1..9479.                     2.109e-01  3.069e-01   0.687  0.49205   
ADCY9..115.                         5.532e-01  7.295e-01   0.758  0.44828   
CBX2..84733.                       -8.365e-01  4.012e-01  -2.085  0.03706 * 
SIK1B..102724428.                  -2.602e-01  1.454e-01  -1.790  0.07352 . 
GAS6..2621.                         9.198e-04  2.870e-02   0.032  0.97444   
homocysteine                       -1.063e+00  1.024e+00  -1.038  0.29939   
cytidine                            1.592e-01  3.957e-01   0.402  0.68752   
C32.2.PC                           -4.363e-01  3.539e-01  -1.233  0.21762   
C32.0.PC                            2.786e-01  5.126e-01   0.544  0.58678   
guanosine                           3.888e-01  1.504e-01   2.586  0.00972 **
C36.1.PC                            4.409e-01  3.426e-01   1.287  0.19807   
phosphocreatine                     4.289e-02  1.062e-01   0.404  0.68635   
C18.0.LPE                          -3.049e-01  4.148e-01  -0.735  0.46231   
C36.1.DAG                           3.032e-02  5.343e-01   0.057  0.95475   
SMAP1..60682.._.PHYH..5264.         4.237e-02  9.846e-02   0.430  0.66700   
MAPK8IP1..9479.._.SMAP1..60682.    -3.477e-02  7.716e-02  -0.451  0.65227   
NECAB3..63941.._.SMAP1..60682.     -3.032e-01  1.550e-01  -1.956  0.05046 . 
GSN..2934.._.PHYH..5264.           -2.611e-02  4.895e-02  -0.533  0.59376   
GSN..2934.._.SMAP1..60682.         -3.255e-03  4.831e-02  -0.067  0.94627   
PKIG..11142.._.NECAB3..63941.      -9.170e-02  6.158e-02  -1.489  0.13642   
PKIG..11142.._.GSN..2934.           4.689e-02  3.461e-02   1.355  0.17554   
AMIGO1..57463.._.GAMT..2593.       -1.337e-02  5.021e-02  -0.266  0.79009   
AMIGO1..57463.._.GSN..2934.         1.527e-03  6.405e-02   0.024  0.98098   
B4GALNT4..338707.._.AMIGO1..57463.  2.024e-02  2.531e-02   0.800  0.42389   
SEPTIN5..5413.._.SMAP1..60682.      5.658e-02  6.241e-02   0.907  0.36463   
FITM2..128486.._.GAMT..2593.        1.336e-02  2.951e-02   0.453  0.65083   
FITM2..128486.._.GSN..2934.        -6.000e-03  3.372e-02  -0.178  0.85878   
FITM2..128486.._.SEPTIN5..5413.    -4.835e-02  4.767e-02  -1.014  0.31052   
CBX2..84733.._.SMAP1..60682.        1.329e-01  8.615e-02   1.542  0.12305   
CBX2..84733.._.PKIG..11142.         5.264e-02  5.233e-02   1.006  0.31443   
CBX2..84733.._.SHISA4..149345.     -5.621e-03  1.509e-02  -0.373  0.70950   
homocysteine._.PHYH..5264.          1.049e-02  8.043e-02   0.130  0.89625   
homocysteine._.SMAP1..60682.        4.835e-02  1.956e-01   0.247  0.80472   
homocysteine._.NECAB3..63941.       1.364e-01  1.871e-01   0.729  0.46595   
homocysteine._.AMIGO1..57463.       1.685e-01  2.387e-01   0.706  0.48028   
acetylcholine._.NECAB3..63941.      1.690e-02  3.325e-02   0.508  0.61121   
acetylcholine._.AMIGO1..57463.     -5.816e-05  9.883e-02  -0.001  0.99953   
C32.2.PC._.AMIGO1..57463.           3.150e-01  2.796e-01   1.126  0.25999   
C32.0.PC._.AMIGO1..57463.           1.035e-04  3.517e-01   0.000  0.99977   
C34.4.PC._.AMIGO1..57463.          -1.579e-01  2.283e-01  -0.691  0.48931   
C18.0.LPE._.AMIGO1..57463.          2.095e-01  2.646e-01   0.792  0.42835   
C24.1.SM._.AMIGO1..57463.          -1.115e-02  2.138e-01  -0.052  0.95842   
C24.0.SM._.AMIGO1..57463.          -2.720e-02  1.581e-01  -0.172  0.86342   
C36.1.DAG._.AMIGO1..57463.         -1.118e-01  3.112e-01  -0.359  0.71949   
SIK1B..102724428.._.ADCY9..115.     5.077e-02  3.957e-02   1.283  0.19950   
SIK1B..102724428.._.CBX2..84733.    4.303e-02  3.179e-02   1.354  0.17588   
cytidine._.ADCY9..115.             -1.123e-01  1.266e-01  -0.887  0.37506   
Zero hurdle model coefficients (binomial with logit link):
                                     Estimate Std. Error z value Pr(>|z|)    
(Intercept)                          2.079135  29.939609   0.069 0.944636    
AMIGO1..57463.                     -30.874015  12.105395  -2.550 0.010759 *  
NECAB3..63941.                       0.936923   4.974985   0.188 0.850620    
SMAP1..60682.                       -0.929043   3.982997  -0.233 0.815566    
SEPTIN5..5413.                       0.659893   0.878284   0.751 0.452446    
MAPK8IP1..9479.                     -0.858137   0.947433  -0.906 0.365068    
ADCY9..115.                          0.013661   2.448339   0.006 0.995548    
CBX2..84733.                         0.456773   1.287576   0.355 0.722773    
SIK1B..102724428.                   -0.192465   0.494029  -0.390 0.696846    
GAS6..2621.                          0.031562   0.098544   0.320 0.748756    
homocysteine                         5.454361   4.220381   1.292 0.196224    
cytidine                             1.539402   1.352719   1.138 0.255118    
C32.2.PC                            -4.489051   1.354717  -3.314 0.000921 ***
C32.0.PC                            -1.433457   1.936015  -0.740 0.459048    
guanosine                            0.011820   0.552491   0.021 0.982932    
C36.1.PC                             1.431180   1.163173   1.230 0.218543    
phosphocreatine                      0.020681   0.338553   0.061 0.951292    
C18.0.LPE                           -1.995648   1.540468  -1.295 0.195154    
C36.1.DAG                            1.549170   1.736024   0.892 0.372196    
SMAP1..60682.._.PHYH..5264.         -0.327983   0.301092  -1.089 0.276017    
MAPK8IP1..9479.._.SMAP1..60682.      0.188425   0.238522   0.790 0.429546    
NECAB3..63941.._.SMAP1..60682.       1.043884   0.506504   2.061 0.039307 *  
GSN..2934.._.PHYH..5264.             0.036199   0.141388   0.256 0.797934    
GSN..2934.._.SMAP1..60682.          -0.181187   0.142274  -1.274 0.202836    
PKIG..11142.._.NECAB3..63941.        0.017360   0.179187   0.097 0.922821    
PKIG..11142.._.GSN..2934.           -0.038485   0.116187  -0.331 0.740471    
AMIGO1..57463.._.GAMT..2593.        -0.040720   0.160150  -0.254 0.799294    
AMIGO1..57463.._.GSN..2934.          0.388462   0.223258   1.740 0.081864 .  
B4GALNT4..338707.._.AMIGO1..57463.   0.217893   0.079063   2.756 0.005853 ** 
SEPTIN5..5413.._.SMAP1..60682.       0.108015   0.206451   0.523 0.600838    
FITM2..128486.._.GAMT..2593.        -0.011112   0.086218  -0.129 0.897453    
FITM2..128486.._.GSN..2934.          0.132210   0.101923   1.297 0.194577    
FITM2..128486.._.SEPTIN5..5413.     -0.314689   0.173453  -1.814 0.069638 .  
CBX2..84733.._.SMAP1..60682.        -0.285666   0.272613  -1.048 0.294693    
CBX2..84733.._.PKIG..11142.          0.022760   0.142888   0.159 0.873445    
CBX2..84733.._.SHISA4..149345.       0.006423   0.054657   0.118 0.906457    
homocysteine._.PHYH..5264.           0.157223   0.236524   0.665 0.506227    
homocysteine._.SMAP1..60682.        -0.227462   0.599604  -0.379 0.704425    
homocysteine._.NECAB3..63941.       -0.861165   0.836241  -1.030 0.303102    
homocysteine._.AMIGO1..57463.       -0.629865   0.696930  -0.904 0.366116    
acetylcholine._.NECAB3..63941.      -0.007893   0.108694  -0.073 0.942113    
acetylcholine._.AMIGO1..57463.      -0.162061   0.361904  -0.448 0.654296    
C32.2.PC._.AMIGO1..57463.            3.805968   1.109330   3.431 0.000602 ***
C32.0.PC._.AMIGO1..57463.            1.319232   1.272524   1.037 0.299873    
C34.4.PC._.AMIGO1..57463.           -0.126322   0.858608  -0.147 0.883034    
C18.0.LPE._.AMIGO1..57463.           1.311345   1.002159   1.309 0.190697    
C24.1.SM._.AMIGO1..57463.           -0.039473   0.726640  -0.054 0.956678    
C24.0.SM._.AMIGO1..57463.            0.359969   0.555964   0.647 0.517329    
C36.1.DAG._.AMIGO1..57463.          -1.050129   1.134147  -0.926 0.354488    
SIK1B..102724428.._.ADCY9..115.      0.004119   0.127776   0.032 0.974287    
SIK1B..102724428.._.CBX2..84733.     0.126880   0.103629   1.224 0.220813    
cytidine._.ADCY9..115.               0.007360   0.434788   0.017 0.986494    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Number of iterations in BFGS optimization: 64 
Log-likelihood: -372.8 on 104 Df

分析結果

MSE variance
2.995536 3.831839

從結果可以看出使用p-value挑出的變數並沒有lasso來得好。lasso所挑出的變數MSE僅有0.763,但p-value所挑出的變數MSE來到了將近3左右。

Logistic(p-value)

Dependent variable:
bone brain kidney liver lung
(1) (2) (3) (4) (5)
MAGEE1..57692. 3.539
(6.019)
ANKRD18B..441459. 0.650
(0.641)
PNPLA4..8228. 0.680
(0.462)
GAMT..2593. 0.682**
(0.304)
BBS5..129880. -0.753*
(0.446)
DBNDD2..55861. 0.190
(0.138)
CTSF..8722. 0.253* 0.062
(0.145) (0.556)
VAT1..10493. 3.058
(5.429)
IRF5..3663. 0.874
(1.346)
MYH10..4628. 0.210
(0.483)
FECH..2235. 1.642
(2.482)
ISL2..64843. -0.398
(0.615)
DZIP1..22873. -0.322
(0.211)
FMNL1..752. 0.067
(0.140)
RAB38..23682. 0.090
(0.418)
SIMC1..375484. -0.044
(0.293)
SMAP1..60682. 0.167
(1.195)
NLRP1..22861. -0.00002
(0.105)
CYBA..1535. -0.024
(0.213)
BEX3..27018. 0.801
(1.154)
LRP3..4037. 0.581
(0.669)
DOCK4..9732. -0.181
(0.170)
AUTS2..26053. -0.655
(0.567)
ARL4D..379. -0.672
(1.114)
SEMA3B..7869. -0.613
(0.438)
ERMP1..79956. -4.148 0.220
(2.715) (0.598)
PNMA8A..55228. -0.221*
(0.117)
H2BC5..3017. 1.945
(4.908)
H2BC4..8347. -1.265
(3.353)
H2AC6..8334. -1.995
(4.059)
CCDC106..29903. 0.357**
(0.144)
GATA2..2624. -0.026
(0.114)
SERPINB8..5271. 2.954 0.551
(2.519) (0.456)
SLC29A3..55315. 0.299
(0.218)
AREG..374. -0.046 -0.040 0.223
(0.093) (0.078) (0.164)
acetylcarnitine -0.589
(0.784)
aconitate -1.071
(1.410)
isocitrate 1.099
(1.356)
CXCL8..3576. -0.013
(0.088)
C18.2.LPC -0.472 0.822
(0.697) (0.747)
VAV1..7409. -0.116
(0.195)
CPT1B..1375. 0.802
(0.659)
TRAF5..7188. 0.018
(0.147)
thymine -2.779
(1.927)
C18.0.CE 1.136 -0.302
(1.100) (0.641)
arachidonyl.carnitine 0.375
(0.783)
C36.1.DAG -0.630 0.643
(0.771) (0.826)
NADP 0.476
(0.511)
C34.4.PC -2.126***
(0.803)
uridine -4.051**
(1.883)
SPACA6..147650. 0.024
(0.173)
KCNMA1..3778. 0.106
(0.091)
trimethylamine.N.oxide 0.694
(0.578)
thiamine -1.175* -1.262
(0.646) (0.951)
C18.0.LPE 1.382 -0.899
(1.305) (0.589)
C48.3.TAG 0.043
(0.326)
H2BC5..3017….PLEK2..26499. 0.043
(0.027)
H2BC5..3017….OAS3..4940. 0.144
(0.189)
H2BC5..3017….EFNB2..1948. -0.044*
(0.025)
PTGER4..5734….H2BC5..3017. 0.176
(0.294)
EPS8L2..64787….H2BC5..3017. -0.022
(0.136)
H2AC6..8334….OAS3..4940. -0.286
(0.193)
H2AC6..8334….PTGER4..5734. -0.103
(0.266)
H2AC6..8334….EPS8L2..64787. -0.063
(0.156)
H2BC4..8347….OAS3..4940. 0.122
(0.159)
H2BC4..8347….H2BC5..3017. 0.001
(0.105)
H2BC4..8347….PTGER4..5734. -0.052
(0.224)
H2BC4..8347….EPS8L2..64787. 0.146
(0.150)
GPR39..2863….H2BC5..3017. -0.107
(0.174)
GPR39..2863….H2BC4..8347. 0.165
(0.211)
CPT1B..1375….H2BC5..3017. 0.051
(0.210)
CPT1B..1375….H2BC4..8347. -0.104
(0.199)
thymine…H2BC5..3017. -0.061
(0.719)
thymine…H2AC6..8334. 0.594
(0.675)
cytidine…H2BC5..3017. -0.149
(0.536)
cytidine…H2BC4..8347. -0.063
(0.561)
taurodeoxycholate.taurochenodeoxycholate…H2BC5..3017. -0.022
(0.049)
C58.7.TAG…H2BC5..3017. -0.125
(0.094)
VAV1..7409….BEX3..27018. 0.021
(0.026)
CPT1B..1375….SERPINB8..5271. -0.147
(0.118)
cytidine 0.594 1.952 1.273
(3.044) (1.588) (1.263)
pyroglutamic.acid 1.915
(4.056)
inositol -1.297**
(0.632)
ZNF607..84775. 0.139
(0.242)
STEAP1..26872. 0.080
(0.151)
KAZN..23254. -0.670*
(0.386)
MID1..4281. -0.085
(0.315)
C36.1.PC 1.158
(0.827)
niacinamide 0.096
(0.580)
lysine 0.407
(0.712)
C46.0.TAG -0.373 0.039
(0.654) (0.636)
CYBA..1535….DEF6..50619. -0.045**
(0.019)
SMAP1..60682….GPC1..2817. -0.047
(0.131)
CTNNAL1..8727….CYBA..1535. -0.014
(0.028)
MT2A..4502….CYBA..1535. 0.017
(0.015)
uracil 0.643
(0.684)
phosphocreatine -0.423
(0.294)
LRP3..4037….GPC1..2817. -0.203** -0.057
(0.095) (0.049)
DIPK1B..138311….GPC1..2817. -0.012
(0.081)
BEX3..27018….ATP9A..10079. 0.038
(0.058)
BEX3..27018….GPC1..2817. 0.102*
(0.061)
BEX3..27018….SMAP1..60682. 0.059
(0.077)
BEX3..27018….FAXC..84553. -0.075
(0.047)
SMTN..6525….CYBA..1535. 0.016
(0.029)
RASA3..22821….CYBA..1535. -0.001
(0.025)
RASA3..22821….EFNB2..1948. 0.026
(0.032)
GSTM3..2947….IL1R1..3554. -0.091*
(0.050)
GSTM3..2947….LRP3..4037. -0.034
(0.048)
GSN..2934….IL1R1..3554. 0.099*
(0.059)
AUTS2..26053….GPC1..2817. -0.023
(0.074)
AUTS2..26053….GSTM3..2947. 0.024
(0.042)
AUTS2..26053….GSN..2934. -0.007
(0.073)
BBS5..129880….GSTM3..2947. 0.106
(0.077)
CTSF..8722….GPC1..2817. 0.066
(0.080)
CTSF..8722….PCSK1N..27344. -0.025
(0.032)
CTSF..8722….LRP3..4037. 0.127*
(0.072)
CTSF..8722….GSN..2934. -0.068
(0.074)
CTSF..8722….AUTS2..26053. 0.012
(0.069)
EVL..51466….GPC1..2817. 0.031 0.066
(0.075) (0.050)
EVL..51466….FAXC..84553. 0.107
(0.077)
SRC..6714….ATP9A..10079. -0.083
(0.096)
SRC..6714….GPC1..2817. 0.005
(0.090)
SRC..6714….MAP2..4133. -0.019
(0.024)
SRC..6714….SMAP1..60682. -0.101
(0.169)
SRC..6714….LRP3..4037. 0.038
(0.125)
SRC..6714….DIPK1B..138311. 0.050
(0.089)
SRC..6714….BEX3..27018. 0.107
(0.074)
SRC..6714….EVL..51466. -0.024
(0.080)
homocysteine…BEX3..27018. -0.345*
(0.194)
SERPINB8..5271….AREG..374. -0.057
(0.048)
cytidine…ERMP1..79956. -0.608
(0.389)
uridine…ERMP1..79956. 1.185**
(0.464)
C18.0.LPE…SERPINB8..5271. -0.457
(0.420)
C46.2.TAG…ERMP1..79956. 0.034
(0.221)
C48.3.TAG…ERMP1..79956. 0.144
(0.198)
C50.3.TAG…ERMP1..79956. -0.046
(0.204)
SHISA4..149345….AUTS2..26053. -0.096
(0.070)
PCDHGC3..5098….GPC1..2817. 0.049
(0.070)
PCDHGC3..5098….GSN..2934. 0.006
(0.073)
PCDHGC3..5098….AUTS2..26053. 0.030
(0.068)
PCDHGC3..5098….CTSF..8722. -0.093
(0.063)
LTBP3..4054….AUTS2..26053. 0.124
(0.087)
SEMA3B..7869….EVL..51466. -0.022
(0.059)
SMARCD3..6604….SEMA3B..7869. 0.018
(0.044)
DENND3..22898….SEMA3B..7869. -0.012
(0.044)
NCOA7..135112….SEMA3B..7869. 0.062
(0.075)
LTBP3..4054….ERMP1..79956. -0.037
(0.081)
VASN..114990….ERMP1..79956. -0.012
(0.030)
ARL4D..379….SEMA3B..7869. 0.127*
(0.071)
ARL4D..379….ERMP1..79956. 0.028
(0.121)
ARL4D..379….NCOA7..135112. -0.105
(0.078)
ARL4D..379….LTBP3..4054. -0.031
(0.085)
cytidine…ARL4D..379. 0.123
(0.112)
lactose…ARL4D..379. -0.040
(0.094)
C40.6.PC 0.985 1.888** 0.271
(0.856) (0.920) (0.843)
C58.7.TAG -30.061**
(14.913)
butyrobetaine -1.150* -0.712
(0.658) (0.625)
X3.phosphoglycerate -0.911
(0.714)
C52.5.TAG -1.520
(7.510)
C56.6.TAG -0.490
(10.551)
C58.8.TAG 1.442 28.705
(2.390) (19.272)
MYH10..4628….PNPLA4..8228. -0.204**
(0.091)
BEX3..27018….PNPLA4..8228. -0.045*
(0.026)
PKIG..11142….PNPLA4..8228. 0.089**
(0.042)
AS3MT..57412….PNPLA4..8228. 0.251***
(0.070)
AS3MT..57412….CTSF..8722. -0.168***
(0.060)
ANKRD18B..441459….PNPLA4..8228. -0.107
(0.128)
ANKRD18B..441459….GAMT..2593. -0.064
(0.101)
IRF5..3663….GAMT..2593. -0.355***
(0.119)
RAC3..5881….MYH10..4628. 0.004
(0.071)
RAC3..5881….MAGEE1..57692. -0.530**
(0.229)
phosphocreatine…MAGEE1..57692. -0.103
(0.222)
acetylcarnitine…MAGEE1..57692. 0.596
(0.629)
C18.0.CE…MAGEE1..57692. -0.851
(0.813)
beta.alanine…MAGEE1..57692. 0.118
(0.370)
VAT1..10493….FECH..2235. 0.007
(0.148)
AREG..374….TNFRSF1B..7133. 0.044
(0.048)
TNFAIP3..7128….TNFRSF1B..7133. -0.118*
(0.064)
IRF5..3663….MYH10..4628. 0.262**
(0.118)
IRF5..3663….FECH..2235. -0.304
(0.195)
IRF5..3663….VAT1..10493. -0.083
(0.164)
RAC3..5881….IRF5..3663. 0.282**
(0.130)
ARL14..80117….TNFRSF1B..7133. 0.057
(0.077)
ARL14..80117….TNFAIP3..7128. 0.007
(0.035)
cytidine…FECH..2235. -0.199
(0.395)
cytidine…VAT1..10493. 0.100
(0.393)
pyroglutamic.acid…VAT1..10493. -0.421
(0.654)
C58.8.TAG…VAT1..10493. -0.216
(0.382)
C54.4.TAG 4.269***
(1.385)
DZIP1..22873….RAB38..23682. 0.001
(0.067)
ISL2..64843….SPARC..6678. 0.032
(0.046)
ISL2..64843….RAB38..23682. 0.054
(0.129)
ISL2..64843….DZIP1..22873. 0.036
(0.104)
TLR3..7098….DZIP1..22873. 0.104
(0.095)
TLR3..7098….GJA1..2697. -0.006
(0.054)
SIMC1..375484….RAB38..23682. -0.062
(0.111)
GOLGA8A..23015….RAB38..23682. 0.029
(0.066)
GOLGA8A..23015….ISL2..64843. 0.042
(0.105)
MID1..4281….GCH1..2643. -0.098*
(0.058)
COL6A1..1291….TFPI..7035. 0.005
(0.018)
STEAP1..26872….COL6A1..1291. -0.050*
(0.028)
WNK2..65268….MID1..4281. 0.081
(0.064)
CXXC5..51523….LRFN1..57622. 0.015
(0.037)
KAZN..23254….MID1..4281. 0.151
(0.094)
ZNF525..170958….WNK2..65268. -0.052
(0.080)
ZNF525..170958….KAZN..23254. 0.030
(0.111)
C48.3.TAG…C58.7.TAG -3.312
(2.207)
C52.5.TAG…C58.7.TAG 6.178
(4.481)
C52.4.TAG…C58.7.TAG 2.013
(4.405)
C56.6.TAG…C48.3.TAG -0.228
(1.759)
C58.8.TAG…C48.3.TAG 3.567*
(2.144)
C58.8.TAG…C52.5.TAG -5.693
(4.702)
C58.8.TAG…C52.4.TAG -2.626
(4.471)
homocysteine 2.527* 0.457
(1.420) (0.495)
X2.aminoadipate -1.050
(0.642)
Constant -18.226 7.512 -8.853 16.240 -10.756
(35.613) (45.959) (19.115) (17.238) (12.673)
Observations 224 224 224 224 224
Log Likelihood -124.430 -121.498 -127.995 -133.217 -129.782
Akaike Inf. Crit. 344.860 330.997 357.991 354.435 365.565
Note: p<0.1; p<0.05; p<0.01

從以上整理的報表可以看到,使用p-value所挑出的變數中,針對各個部位所挑出的變數有更多重複的變數,例如AREG..374.在bone, brain和kideny中均有使用到這個變數,但是建立出來的模型但是對於三個部位,此變數皆不顯著。

從變數顯著與否的角度來看,五個器官皆沒共通顯著的基因或是代謝體抑或是交互作用。

AUC

bone brain liver lung kidney
0.7774545 0.7470661 0.7392968 0.7599614 0.7668344

Accuracy

bone brain liver lung kidney
0.4666667 0.6304348 0.5 0.5217391 0.6304348

從Accuracy也可以看出,以p-value所挑出的變數也皆劣於lasso所挑出的變數。

結論

從以上結果可以的得之,lasso在這樣類型的資料中,其挑選的變數是優於使用p-value的方式所挑選出來的變數。

References

Bobbitt, Zach. 2024. “How to Handle: Glm.fit: Fitted Probabilities Numerically 0 or 1 Occurred.” 2024. https://www.statology.org/glm-fit-fitted-probabilities-numerically-0-or-1-occurred/.
“Depmap.” 2024. 2024. https://depmap.org/metmap/.
Feng, Cindy Xin. 2021. “A Comparison of Zero-Inflated and Hurdle Models for Modeling Zero-Inflated Count Data.” Journal of Statistical Distributions and Applications 8 (1): 8.
Phipson, Belinda. 2024. “RNA-Seq Analysis in r.” 2024. https://combine-australia.github.io/RNAseq-R/06-rnaseq-day1.html#Quality_control.